Home Press center Indirect Prompt Injections: Vulnerabilities in AI Auditors of Enterprise Code

Cloudflare News

Published: may 11, 2026

Indirect Prompt Injections: Vulnerabilities in AI Auditors of Enterprise Code

The article is also available at:

Ukrainian, Russian

Cloudforce One analyzed 18,400 API calls to determine how cybercriminals are using linguistic deception and file structure to bypass automated AI code auditors. The findings show that intelligent code inspection systems lose their ability to detect threats when manipulative comments account for less than 1% of a file or when a malicious payload is hidden inside arrays larger than 3 MB. This indirect attack mechanism turns defensive tools into a vulnerable link in the IT infrastructure, demanding new approaches to data preprocessing.

Indirect Prompt Injections: Vulnerabilities in AI Auditors of Enterprise Code - image 1

The Challenge

In March 2026, researchers at Cloudforce One, a division of Cloudflare, identified attempts to manipulate detection systems through indirect code prompt injection (IDPI). Instead of relying on traditional exploits, an attacker embeds hidden instructions into source code to alter the reasoning logic of a large language model.

While monitoring the Workers platform, the team discovered VPN and proxy tunneling scripts containing an unusual anomaly: thousands of lines of multilingual “Notice to AI” headers. These were natural-language instructions designed to convince the automated auditor that the solution was completely safe and legitimate. Models often prioritize authoritative text within their context window while disregarding their own threat-detection training.

The Scale

To assess the impact of these manipulations, the team conducted a large-scale experiment, analyzing 18,400 API calls based on 100 confirmed malicious scripts. The testing covered seven different AI models, split between flagship systems with advanced reasoning capabilities and baseline high-speed solutions.

The volume of manipulative insertions ranged from a single comment to 1,000 per script, placed randomly throughout the text. This approach made it possible to simulate realistic development structures and uncover the hidden thresholds at which an AI auditor begins to misclassify dangerous objects as fully legitimate.

Functionality

The research revealed a so-called U-shaped curve in manipulation effectiveness, where a moderate number of comments successfully reduces detection rates, while extreme volumes actually heighten system scrutiny. The highest risk emerges in the “bypass zone,” when such statements account for less than 1% of the total file size.

Within that range, the average detection rate drops from 67.3% to 53.3%. By contrast, inserting more than 1,000 comments acts as a recognizable fraud signature. In these cases, most models interpret text saturation with safety claims as an indicator of malicious intent, driving the blocking rate up to 99%.

Obfuscation

The most concerning finding was that structural concealment outperforms linguistic deception. Placing malicious logic inside files larger than 3 MB that include massive legitimate libraries such as the React SDK overloads the model’s context window and reduces threat-detection effectiveness to 12–18%.

In addition, AI auditors exhibit specific linguistic biases. For example, some models treat comments in Russian or Chinese as a strong risk signal, sharply increasing detection rates, while Estonian occasionally reduced the perceived suspiciousness of the analyzed script.

Architecture

Protecting automated inspection pipelines requires a deep redesign of structure-preparation processes before content is submitted for analysis. A critical step is the implementation of automated comment stripping, which fully neutralizes linguistic influence on AI models.

Additional protection comes from deliberately excluding standard code from third-party components during analyzer execution, allowing the auditor to focus exclusively on the developer’s own logic. Experts also emphasize the need to anonymize variable names to prevent “friendly” or neutral naming from influencing the system’s final verdict.

Evolution

Beyond directly bypassing inspections, extreme volumes of informational noise can trigger a structural breakdown in the reasoning of flagship AI models. Instead of classifying a threat, the system refuses to produce a response and generates incoherent text or errors. This creates a state of functional paralysis in which automated cybersecurity controls simply cannot issue a command to block object execution. Ultimately, modern organizations need to do more than deploy AI capabilities—they need to integrate AI into a properly engineered analytics process that is stripped of contextual noise.

Integrating autonomous agents into code review workflows is opening new opportunities for technology-driven businesses, but it is also turning the models themselves into targets for manipulation. Effective protection depends on sound architecture: eliminating linguistic traps, focusing on target scenarios, and preventing context fatigue caused by excessive input volume.

As an official Cloudflare solutions distributor, iIT Distribution provides expert support for the deployment of modern information security systems. The iIT Distribution team supports projects end to end—from architectural risk assessment to the deployment and configuration of threat detection platforms—helping partners protect enterprise infrastructure with flexibility and confidence.

News