Your AI coding agent will run this exploit for you

See how we found a high-severity CVE in Cursor

Your AI coding agent will run this exploit for you

See how we found a high-severity CVE in Cursor

Claude Mythos and Project Glasswing Are Research Breakthroughs, Not Security Programs

Anthropic's Claude Mythos and Project Glasswing proves AI can find vulnerabilities faster than humans, but discovering bugs in open-source code is a different problem than continuously validating exploitability, delivering fixes, and retesting them across living production environments.

Ido Geffen, Co-Founder & CEO

5 mins

Explore Article +

Anthropic Validates the Case for AI Vulnerability and Exploit Scanning

Anthropic’s Claude Mythos and Project Glasswing together mark a significant moment for the security industry; not because of any one vulnerability Mythos found, but because of what they confirm about the trajectory we’re on.

Anthropic announced this week that their new frontier model, Claude Mythos, has already identified thousands of high-severity vulnerabilities across every major operating system and every major web browser, many of which had survived decades of human review. For example, their own researchers found a 27-year-old bug in OpenBSD, a system renowned for its security hardening. Mythos autonomously wrote a remote code execution exploit against FreeBSD’s NFS server that granted full root access to unauthenticated users. In live-browser benchmarks, it turned known vulnerabilities into working exploits at rates that dwarf anything previous models could achieve.

Anthropic is championing Mythos as proof-positive that AI models have crossed the threshold where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.

We agree. The big question then becomes; how do we get that capability out of research labs and into the hands of actual security teams?

The Limitations of a Frontier Model (Even Mythos) for Day-to-Day Continuous Offensive Security Testing

Project Glasswing is impressive in scope; Anthropic is committing up to $100 million in usage credits. AWS, Apple, Microsoft, Google, Cisco, CrowdStrike, and others are participating. The goal – to find and patch zero-day vulnerabilities in the world’s most critical infrastructure software – is exactly the kind of coordinated action the industry needs.

But attackers don’t need Anthropic’s most advanced code reasoning model to find exploitable vulnerabilities. Capable offensive AI is already in the wild, not locked behind any paywalls or early access windows, and it’s improving every day. 

There’s also a structural gap between what Glasswing addresses and what most security teams face day-to-day. Mythos is optimized for deep source-code analysis of widely-used open-source projects; Linux kernels, browser engines, cryptographic libraries. That work is vital. 

But the vulnerabilities that put most enterprises at risk aren’t 27-year-old bugs in OpenBSD. They’re business logic flaws in custom applications. They’re authorization bypasses in SaaS integrations. They’re chained weaknesses across APIs that only surface when someone understands how the specific system is supposed to work, and probes it in production.

Even at the frontier of what Mythos can do, there are meaningful boundaries. According to Anthropic’s own system card (accessible here), Mythos was tested against cyber ranges simulating real enterprise and operational technology environments. However, in a properly configured sandbox with modern patches, it failed to find novel exploits. The FreeBSD NFS exploit, while genuinely remarkable, occurred under unusually favorable conditions: no kernel address randomization, no defense-in-depth mitigations that would normally be present. These are real limitations, and they matter when you’re trying to assess how this translates to defending live production environments with layered security controls.

The cost profile is also worth noting. Anthropic disclosed that across a thousand runs through their scaffold against OpenBSD, the total cost was under $20,000, and the specific run that discovered the most critical vulnerability cost under $50. But as they acknowledge, that number only makes sense in hindsight. You can’t know in advance which run will succeed. 

This is the economics of vulnerability research, not the economics of continuous security validation.

Mythos Finds Bugs. Your Business Needs to Fix Them.

One of the most telling data points from the Glasswing announcement: fewer than 1% of the vulnerabilities Mythos has found so far have been patched. That’s not a failure of the model, it’s a reflection of how slow the disclosure-to-remediation pipeline actually is. Discovering a vulnerability is necessary. Confirming it’s exploitable, communicating it to the right team, and verifying the fix – all at the speed that modern attackers operate – is a fundamentally different problem.

This is the part of the offensive security equation that doesn’t get solved by a more powerful model. It gets solved by architecture: a system that discovers vulnerabilities, validates their exploitability in your live environment, and delivers remediation guidance tailored to your specific stack. Attack and defense in one continuous loop.

What This Means for Security Leaders

Mythos should be taken seriously, both as a capability and as a signal of what’s coming. If this model can do what Anthropic has demonstrated today, models with similar capabilities will proliferate. The rate of AI progress means these capabilities won’t stay bottled up behind a limited access program for long. 

But security teams operating on running systems today can’t wait for cutting edge research to become operational reality.

The only sustainable response is to stop treating security testing as a quarterly ritual and start treating it as a continuous operational discipline. Security validation needs to operate on the same cadence as AI-enabled attacks.

The race is well underway, and it’s time to lean on the accelerator.

Finding vulnerabilities is only the first step. Novee closes the full loop: continuous discovery, validated exploitability, personalized remediation, and automatic retesting, all powered by a proprietary AI model built on real attacker tradecraft. See what your attackers already know. Book a demo.

Stay updated

Get the latest insights on AI, cybersecurity, and continuous pentesting delivered to your inbox