AI Can Find Bugs. But Can It Prove You’re Actually Secure?

Claude Code Security represents a tremendous move forward for AI code scanning, but finding vulnerabilities in static codebases – even at machine-speed – is not how real attackers operate.

Ido Geffen, Co-Founder & CEO

February 24, 2026

4 mins

Claude Code Security represents a tremendous move forward for AI code scanning, but finding vulnerabilities in static codebases – even at machine-speed – is not how real attackers operate.

AI SAST for AI Code is A Much-Needed Step Forward – But It’s Not Enough

Anthropic’s release of Claude Code Security and their Frontier Red Team’s research mark an important moment for application security.

Legacy code analysis is long overdue for an AI upgrade (AI SAST is already getting attention from leading analysts, like Forrester). The idea that advanced reasoning models can uncover complex vulnerabilities directly in the codebase – issues that humans might overlook – is meaningful progress.

It is also only part of the story. Code-level analysis, even when powered by frontier AI, does not fully capture how systems are actually attacked.

Real adversaries do not exploit source code in isolation. They target live systems with real configurations, authentication flows, feature flags, third-party integrations, and evolving business logic. They chain weaknesses together. They probe runtime behavior. They adapt based on system feedback.

In practice, this distinction matters more than it might appear.

AI Operating on Code is Powerful, AI Operating on Real-World Applications is Transformative

Many issues that look critical in static analysis prove non-exploitable in production.

At the same time, some of the most damaging vulnerabilities, centered around business logic flaws, state manipulation bugs, and multi-step chained exploits, rarely surface through code inspection alone. These vulnerabilities emerge only when a system is exercised dynamically, the way an attacker would.

We have seen this firsthand in:

Our mission to design AI that thinks like a hacker, not a chatbot. Which is why, rather than build on top of general-purpose LLMs like Claude, we built our own small, deterministic model to behave like a real attacker.
- On constrained web exploitation challenges validated in a live browser, Claude Sonnet 4 peaked at 64%. Novee’s model reached 90% accuracy – because it’s optimized for interactive, adversarial reasoning rather than generic text prediction.
Our research on two widely-used PDF engines, our AI agent uncovered 16 new zero-day vulnerabilities – not by passively reviewing code, but by interacting with real implementations and validating exploitability in practice.

In short, offensive security is not fundamentally a code comprehension problem. It is an interactive, stateful reasoning problem, grounded in how systems behave under pressure.

AI-driven SAST will continue to improve, and it will rightly pressure legacy tooling. The baseline for automated code review is rising quickly, but executives are wise enough to know stronger security posture won’t come from better scanning within repositories alone – whether it’s AI-powered or not.

That’s why security leaders are turning towards AI penetration testing.

Bridging the Gap Between Theory and Actual Risk

Ultimately, CISOs are not held accountable for theoretical weaknesses in a codebase. They are responsible for whether someone can break into their system today.

Answering that question requires more than identifying potential flaws. It requires black-box exploration, runtime validation of exploitability, contextual understanding of integrations and configurations, and continuous retesting as environments evolve. In other words, it requires AI that operates the way attackers operate – persistently, adaptively, and against live systems.

Frontier labs at AI companies like Anthropic are demonstrating that AI can reason meaningfully about security in code. The next stage, and the one that matters most for enterprise risk, is context-aware offensive validation – AI that can reason meaningfully in production.

The organizations that embrace this shift will gain something far more important than static findings; the confidence that their teams can reduce real risk as fast as attackers create it.

If you want to see how continuous, attacker-level validation works in practice, book a demo.

Why External Network Penetration Testing Is Critical for Compliance

Novee Team

February 26, 2026

A clean vulnerability scan doesn’t mean your perimeter is secure. Compliance frameworks know this. That’s why standards like PCI DSS 4.0, SOC 2, and HIPAA no longer accept documentation as…

11 mins

Agentic AI Pentesting: Top Emerging Threats and How to Defend Against Them

Novee Team

February 20, 2026

Attackers don’t wait for your next pentest. They move at machine speed, probing systems around the clock while most organizations test once or twice a year. That gap can be…

10 mins

Vulnerabilities in Popular PDF Platforms Allowed Account Takeover, Data Exfiltration

Novee Marketing

February 20, 2026

SecurityWeek reports on Novee’s AI-driven research uncovering 16 PDF platform vulnerabilities

1 min