Inside the benchmark: app architectures, walkthroughs of findings, and what each scanner actually caught

Executive Summary

This article presents Part 2 of a benchmark study evaluating security scanning capabilities against AI-generated code. The research compares LLM-based security tools, specifically ProjectDiscovery's Neo and Claude Code, against traditional SAST and DAST scanners. Findings indicate that LLM-based tools identified high-value security findings often missed by conventional scanners. Neo demonstrated superior performance compared to Claude Code, yielding more true positives and fewer false positives by validating hypotheses against running applications. While no specific threat actors or malware families are identified, the study highlights the evolving landscape of application security testing. Organizations leveraging AI for code generation should consider integrating LLM-based security tools to enhance vulnerability detection. The research underscores the need for advanced scanning methodologies to mitigate risks associated with vibe coding and automated development pipelines effectively.

Summary

This is Part 2 of our vibe coding security benchmark study. In Part 1, we compared how LLM-based security tools like ProjectDiscovery's Neo and Claude Code performed against traditional SAST and DAST scanners on AI-generated code. We found that LLM-based tools like Neo and Claude Code detected many high-value findings that traditional scanners missed. Between Neo and Claude Code, Neo produced more true positives and fewer false positives because it could validate hypotheses against a running app

Published Analysis

This article presents Part 2 of a benchmark study evaluating security scanning capabilities against AI-generated code. The research compares LLM-based security tools, specifically ProjectDiscovery's Neo and Claude Code, against traditional SAST and DAST scanners. Findings indicate that LLM-based tools identified high-value security findings often missed by conventional scanners. Neo demonstrated superior performance compared to Claude Code, yielding more true positives and fewer false positives by validating hypotheses against running applications. While no specific threat actors or malware families are identified, the study highlights the evolving landscape of application security testing. Organizations leveraging AI for code generation should consider integrating LLM-based security tools to enhance vulnerability detection. The research underscores the need for advanced scanning methodologies to mitigate risks associated with vibe coding and automated development pipelines effectively. This is Part 2 of our vibe coding security benchmark study. In Part 1, we compared how LLM-based security tools like ProjectDiscovery's Neo and Claude Code performed against traditional SAST and DAST scanners on AI-generated code. We found that LLM-based tools like Neo and Claude Code detected many high-value findings that traditional scanners missed. Between Neo and Claude Code, Neo produced more true positives and fewer false positives because it could validate hypotheses against a running app This is Part 2 of our vibe coding security benchmark study. In Part 1, we compared how LLM-based security tools like ProjectDiscovery's Neo and Claude Code performed against traditional SAST and DAST scanners on AI-generated code. We found that LLM-based tools like Neo and Claude Code detected many high-value findings that traditional scanners missed. Between Neo and Claude Code, Neo produced more true positives and fewer false positives because it could validate hypotheses against a running app