Your AI coding agent will run this exploit for you

See how we found a high-severity CVE in Cursor

Your AI coding agent will run this exploit for you

See how we found a high-severity CVE in Cursor

A CVSS 10.0 in Gemini CLI: How Agentic Workflows Are Reshaping Supply Chain Risk

A CVSS 10.0 RCE vulnerability in Google Gemini CLI allowed external attackers to execute commands on host systems, turning CI/CD pipelines into supply-chain attack paths.

Elad Meged, Founding Engineer & Security Researcher

5 mins

Explore Article +

This is the type of exploit that turns CI/CD pipelines into supply-chain attack paths.

What you need to know

  • The Novee Security research team disclosed a critical remote code execution vulnerability in Google Gemini CLI and the run-gemini-cli GitHub Action.
  • Google assigned it CVSS 10.0, the maximum severity. 
  • The vulnerability allowed an unprivileged external attacker to force their own malicious content to load as Gemini configuration. This triggered command execution directly on the host system, bypassing security before the agent’s sandbox even initialized. 
  • No prompt injection. No model decision. This was infrastructure-level execution before the AI system ever had to reason.
  • Patches are available in `@google/gemini-cli` `0.39.1` and `0.40.0-preview.3`, and `google-github-actions/run-gemini-cli` `0.1.22`. Every Gemini CLI GitHub Action below those versions was affected.

Inside the Vulnerability

The flaw lived in how Gemini CLI handled workspace trust in non-interactive environments. When running in headless mode – like a CI/CD job – Gemini CLI automatically trusted the current workspace folder, loading any agent configuration it found there without review, sandboxing, or human approval.

That meant an attacker who could place content in a repository’s workspace – by opening a pull request, for example – could plant configuration that the agent would silently trust and act on. The result was direct command execution on the host running the agent, before its sandbox ever initialized.

Across every affected workflow, the impact was the same: code execution on the host running the agent gave an unprivileged outsider access to whatever secrets, credentials, and source code the workflow could reach. Enough for token theft, supply-chain pivots, and lateral movement into downstream systems.

The Supply Chain Under Pressure

AI has dramatically increased the volume of code being written, and that volume creates more downstream work. More reviews, more triage, more configuration changes. AI coding agents now sit inside CI/CD pipelines holding the execution privileges of a trusted contributor, reading from the same workspaces a contributor would touch. This level of access can lead to critical supply-chain attacks, the type that stem from the developer workflow itself.

Software supply chain attacks have accelerated. In the last 18 months alone:

  • `axios` npm package** (March 2026). Millions of installs compromised via a hijacked maintainer account.
  • Shai-Hulud worm (2025). Self-replicating campaign that hit hundreds of npm packages, with a v2.0 variant adding a wiper.
  • XZ Utils backdoor (2024). RCE through OpenSSH on affected Linux systems, present in development versions of major distributions.
  • Polyfill.io CDN hijack (2024). Any script adopted from this open-source library would immediately download malicious code.

These attacks all exploit the same idea: trust in the development pipeline, including packages, build steps, maintainer accounts, automation, gets reused as a delivery mechanism to reach downstream users at scale. AI agents are the newest component to inherit that trust.

AI Security is Both the Model and the Infrastructure

Prompt injection and model manipulation remain critical areas of AI security research. They matter, and they are not going away.

But in real agentic systems, the dangerous behavior often emerges where model behavior meets traditional execution surfaces: shell tools, repository files, cloud credentials, CI/CD runners, deployment workflows, and production systems. The attack surface is no longer only the model, and it is no longer only the application. It is the full path between them: prompts, files, configuration, secrets, shells, CI/CD jobs, and the host environment running the agent. 

Traditional security tooling was not built for this interplay. Static analysis can flag a missing check in source code. Dependency scanners can catch a known CVE in a package. AI safety reviews can probe a model’s behavior in isolation. None of them can see how those layers behave together when an external attacker is in the loop.

This is exactly the kind of surface our team set out to research at Novee: places where an unprivileged attacker can quietly plant content that an AI agent will trust as configuration, instructions, or data. The Gemini CLI vulnerability is what we found when we looked: no prompt injection, no model decision, just attacker-controlled content silently accepted as agent configuration and executed on the host before any sandbox came up.

Where Novee Comes In: Full-Chain Adversarial Validation

Using AI agents in CI/CD is no longer optional. The volume of code modern teams ship requires AI assistance throughout the pipeline, from agents that triage PRs and write fixes to copilots that draft most of the code shipping to production. The trust and security boundaries that used to govern how code interacted have been quietly redrawn to make that possible, and the security tooling meant to validate the result has not caught up. 

The answer is to expand how we look at security. Real adversarial validation now needs a full view of the system: the AI agents, the application code they touch, the CI/CD pipelines they run inside, the cloud permissions they inherit, and the production systems they ultimately influence.

This is the part of the attack surface most teams cannot validate today. Traditional penetration testing does not cover AI behavior. Traditional AI safety reviews do not cover infrastructure. The Gemini CLI vulnerability is exactly the kind of issue that lives in the gap between them.

That is the gap Novee is built for. We combine AI security research, application security, and offensive penetration testing into a single practice — testing how agents, code, CI/CD pipelines, cloud permissions, and production systems actually behave under attack, the way a real adversary would, not the way a checklist would.

AI helps teams ship faster. Continuous adversarial validation helps them keep shipping safely.


The next exploit hiding in your pipeline is the one nobody has named yet. Novee runs continuous adversarial validation against your live systems on every change, probing them the way a real attacker would, and proving exploitability before a CVE catches up. Book a demo to see what’s already exposed.

You can read the full Security Advisory here.

Stay updated

Get the latest insights on AI, cybersecurity, and continuous pentesting delivered to your inbox