
Claude Just Found 22 Firefox Vulnerabilities in Two Weeks—More Than Any Month in 2025
Translate this article
Anthropic has released details of a collaboration with Mozilla that demonstrates just how capable AI has become at finding security flaws. Over two weeks, Claude Opus 4.6 discovered 22 vulnerabilities in Firefox—14 of which Mozilla classified as high severity. That's nearly 20 percent of all high-severity Firefox vulnerabilities remediated in all of 2025.
The findings are already shipped to hundreds of millions of users in Firefox 148.
Why Firefox
Mozilla's browser was a deliberate test case. It's one of the most well-tested and secure open-source projects in the world, with a complex codebase and hundreds of millions of daily users. Browser vulnerabilities are particularly dangerous because users routinely encounter untrusted content and depend on the browser to keep them safe. If Claude could find novel bugs here, it could find them anywhere.
The Discovery Process
The team started by testing whether Claude could reproduce previously identified Firefox CVEs in older codebase versions. Opus 4.6 succeeded at a high rate—but that alone wasn't proof, since those historical vulnerabilities might have been in Claude's training data.
So they pointed it at the current Firefox codebase and let it run. Within twenty minutes of exploration, Claude reported a Use After Free vulnerability in the JavaScript engine—a memory flaw that could allow attackers to overwrite data with malicious content. Researchers validated it, filed a report, and Claude had already discovered fifty more crashing inputs in the time it took to submit the first one.
By the end, Claude had scanned nearly 6,000 C++ files and generated 112 unique reports. Mozilla encouraged bulk submission, and the team obliged—even when not every crashing test case had clear security implications.
The Exploit Question
Anthropic also tested whether Claude could turn those vulnerabilities into working exploits—the kind of tools hackers would use to execute malicious code. They ran hundreds of attempts, spent roughly $4,000 in API credits, and succeeded exactly twice.
The takeaway is twofold. First, Claude is far better at finding bugs than exploiting them—by orders of magnitude. Second, the cost of identifying vulnerabilities is an order of magnitude cheaper than weaponizing them. For now, defenders have the advantage.
But the fact that Claude succeeded at all, even in controlled environments with some security features removed, is a warning. Firefox's sandbox mitigated these specific exploits, but sandbox escapes aren't unheard of. Claude built one necessary component of an end-to-end attack.
The Partnership Model
Mozilla's response offers a template for how maintainers and AI researchers can work together. The Firefox team highlighted three elements that made Claude's submissions trustworthy:
· Minimal test cases accompanying each report
· Detailed proofs-of-concept
· Candidate patches generated by Claude and validated by humans
Anthropic is publishing its Coordinated Vulnerability Disclosure principles and encouraging researchers using AI-powered tools to include similar verification evidence when submitting reports.
What Works: Task Verifiers
The team found that Claude performs best when it can check its own work with external tools—what they call "task verifiers." These give the agent real-time feedback as it explores a codebase, allowing deep iteration until it succeeds.
For patching, good verifiers need to confirm two things: that the vulnerability is actually gone, and that the program's intended functionality is preserved. Anthropic built tools to test both. The result: dramatically improved patch quality, even if not every AI-generated patch is merge-ready without human review.
Opus 4.6 is currently far better at finding and fixing vulnerabilities than exploiting them. That gap gives defenders an advantage. Anthropic is already bringing these capabilities to customers and maintainers through Claude Code Security.
But the gap won't last. "If and when future language models break through this exploitation barrier, we will need to consider additional safeguards," the company warns. The message to developers: use this window to make your software more secure.
Anthropic plans to significantly expand its cybersecurity efforts—working with developers to find vulnerabilities, building tools to help maintainers triage reports, and proposing patches. The Firefox collaboration is a proof point. The race is just beginning.
About the Author

Leo Silva
Leo Silva is an Air correspondent from Brazil.
Recent Articles
Subscribe to Newsletter
Enter your email address to register to our newsletter subscription!