Everyone is finding vulns. The hard part is proving them.

Executive Summary

This article discusses the emerging role of Large Language Models (LLMs) in vulnerability discovery rather than detailing a specific cyber threat campaign. Vendors like Anthropic and OpenAI are leveraging AI models, such as Opus 4.6 and Codex Security, to identify zero-day vulnerabilities in critical infrastructure projects like OpenSSH and GnuTLS. While these tools demonstrate significant potential in automating security testing, the report highlights practical challenges for security teams, including high noise rates and the necessity for human validation. There are no specific threat actors or malware families identified within this context. Consequently, immediate mitigation strategies focus on integrating AI-assisted testing carefully within existing vulnerability management workflows. Organizations should remain aware of how adversaries might similarly utilize AI for offensive vulnerability discovery, potentially accelerating exploit development cycles against unpatched systems in the near future.

Summary

LLMs are a genuine leap forward for vulnerability discovery. Anthropic reported 500+ zero-days from Opus 4.6 and OpenAI's Codex Security discovered 14 CVEs across projects like OpenSSH and GnuTLS. If you've experimented with LLMs for security testing, you've probably been impressed too. The practical reality for a security team deploying AI is messier than the headlines or early POC results suggest. Noise compounds fast. Anthropic brought in external security researchers to help validate the vo

Published Analysis

This article discusses the emerging role of Large Language Models (LLMs) in vulnerability discovery rather than detailing a specific cyber threat campaign. Vendors like Anthropic and OpenAI are leveraging AI models, such as Opus 4.6 and Codex Security, to identify zero-day vulnerabilities in critical infrastructure projects like OpenSSH and GnuTLS. While these tools demonstrate significant potential in automating security testing, the report highlights practical challenges for security teams, including high noise rates and the necessity for human validation. There are no specific threat actors or malware families identified within this context. Consequently, immediate mitigation strategies focus on integrating AI-assisted testing carefully within existing vulnerability management workflows. Organizations should remain aware of how adversaries might similarly utilize AI for offensive vulnerability discovery, potentially accelerating exploit development cycles against unpatched systems in the near future. LLMs are a genuine leap forward for vulnerability discovery. Anthropic reported 500+ zero-days from Opus 4.6 and OpenAI's Codex Security discovered 14 CVEs across projects like OpenSSH and GnuTLS. If you've experimented with LLMs for security testing, you've probably been impressed too. The practical reality for a security team deploying AI is messier than the headlines or early POC results suggest. Noise compounds fast. Anthropic brought in external security researchers to help validate the vo LLMs are a genuine leap forward for vulnerability discovery. Anthropic reported 500+ zero-days from Opus 4.6 and OpenAI's Codex Security discovered 14 CVEs across projects like OpenSSH and GnuTLS. If you've experimented with LLMs for security testing, you've probably been impressed too. The practical reality for a security team deploying AI is messier than the headlines or early POC results suggest. Noise compounds fast. Anthropic brought in external security researchers to help validate the vo