Breaking Down the AI Pentesting Market: Novee CEO Ido Geffen on CyberRisk TV at RSAC
Novee CEO Ido Geffen sits down with CyberRisk TV to discuss finding vulnerabilities, and fixing them before the bad guys can exploit them.
At RSAC 2026, Novee CEO Ido Geffen spoke with CyberRisk TV’s Josh Marpet about need-to-know information for security leaders looking at AI penetration testing solutions: what it actually takes to build an AI penetration testing platform, and the key differences between vulnerability scanners, exploit scanners, and true AI penetration testing.
Here are the highlights.
1 – When it comes to getting actionable results from your autonomous security tests, not all AI pentesters are created equal.
The AI security testing market is getting crowded with vulnerability scanners and exploit scanners, built on top of frontier LLMs. These tools can find vulnerabilities quicker than humans can, and at scale. But not every vulnerability is a real risk, and trading accuracy for scale leaves security leaders with a deluge of false positives and noise. The key issue here is using a generic AI model to try and mimic the deep intuition and multi-step attack logic of real attackers. Broad, but shallow results.
The Novee solution was to instead take an open-source foundational model and fine-tune it using reinforcement learning, derived from the techniques of our elite human operators. That training took months to get right, and it’s ongoing. Ido and Josh discuss the Novee cyber-range – internally called the “Novee Gym” – where thousands of open-source applications are reconstructed from the ground up, and the model (a.k.a “athlete AIs”) trains and benchmarks continuously against this environment.
For a deep-dive into how we did it, read our blog.
2 – Business logic vulnerabilities are the hardest to catch. And the most valuable to find.
A scanner can flag a known CVE, but it can’t understand how your application is supposed to work, and therefore can’t identify where it doesn’t. Business logic vulnerabilities live in the gap between intended behavior and actual behavior: an authorization flow that breaks under a specific sequence of user actions, a permission boundary that doesn’t hold when a role is downgraded mid-session.
Finding these vulnerabilities requires context. That means understanding the application’s specific workflows, reading available documentation, and scoping testing to the areas of genuine risk for that organization. It also means not flagging behavior that’s intentional; a platform that treats every permissive design choice as a vulnerability will bury real findings in noise.
3 – There’s a meaningful difference between a vulnerability scanner, an exploitability scanner, and a true penetration test.
Josh Marpet drew a distinction that cuts to the heart of how the market is stratified: vulnerability scanning identifies potential weaknesses, exploitability scanning maps the terrain and confirms a weakness is likely real, but true penetration testing goes all the way. It steals the gold from the safe, rather than just casing the bank.
Most AI pentesting tools on the market today sit in the middle tier. They can tell you a weakness exists and is probably exploitable. What they typically can’t deliver is a working proof-of-concept with full reproduction steps; the kind of evidence that removes all ambiguity about whether a vulnerability is real and how bad the impact could be. That’s the difference between an alert and a confirmed exploit.
When evaluating vendors, ask specifically: do you deliver a working proof-of-concept for every critical finding? And what does that output actually look like?
4 – Remediation guidance should be part of the deliverable, not a separate engagement.
A confirmed exploit that comes without a clear path to fixing it creates a different kind of problem for security teams. Generic OWASP references don’t close tickets. Engineering teams need to know what to change, in their specific stack and configuration, and why the vulnerability is exploitable in the first place.
The advice for security leaders: Look for platforms that explain the full impact of a finding in context: what’s exposed, what could be accessed, and what specifically needs to change. The closer that guidance is to something an engineer can act on directly, the faster the exposure window closes.
5 – Pre-test transparency should be guaranteed.
Any platform running offensive operations against a production environment should be able to show you exactly what it’s going to do before it does it. A preliminary report outlining scope, test cases, and methodology is more than a nice-to-have.
6 – Continuous testing is the future, but ultimately testing cadence should align with your environment and your business needs.
Point-in-time testing made sense when applications changed slowly, but modern environments depend on code shipping continuously and new integrations expanding. Attack surfaces expand between scheduled assessments, and continuous testing closes that gap – but it has to be scoped and controlled correctly to run safely in production.
Most teams who have the option, opt for continuous. For those that prefer a defined cadence, that’s a valid choice too. The key is that testing frequency should reflect how fast your environment actually changes.
7 – Custom applications and AI-enabled systems must be fully in scope.
Off-the-shelf SaaS pentesting covers a narrow slice of most organizations’ actual attack surface. Custom-built applications, internal tools, and – increasingly – AI-enabled systems like chatbots, LLM-powered workflows, and autonomous agents are where real business logic lives, and where novel vulnerabilities are most likely to hide.
Any serious offensive security program needs to account for the full external attack surface, not just the parts that are easiest to test.
Watch the full interview with Ido Geffen and Josh Marpet on CyberRisk TV here. And to see Novee in action for yourself, schedule a demo.