Tech 7 min read

Injection Attacks on AI Agent Memory and Automated Smart Contract Exploitation with EVMbench

It was a week when the relationship between AI and security shifted from one-way to two-way.

On one side, multiple academic papers and real-world incidents reported attack techniques that target AI agents’ memory files. AI is becoming the party that “gets attacked.” On the other side, a joint study by OpenAI and Paradigm showed that an AI agent can autonomously exploit smart-contract vulnerabilities with a success rate above 70%. AI is becoming the party that “launches attacks.”

Let’s map this two-way structure through concrete examples from each side.


AI agents’ memory files are a new target

AI agents such as Claude Code, Cursor, and Windsurf load configuration and memory files into the context window at startup. From the LLM’s perspective, it cannot distinguish “system instructions” from “text loaded from memory files”—and this is now attracting researchers’ attention as a new attack target. The situation where vulnerabilities in development tools themselves become an attack surface has also been reported in cases with Chrome DevTools and VSCode Copilot.

From late 2025 into 2026, multiple attack techniques targeting the memory layer of AI agents have been reported as academic papers and real-world incidents.

Memory poisoning via queries alone: MINJA and InjecMEM

MINJA (Memory INJection Attack), presented at NeurIPS 2025, poisons the memory bank using only normal queries to the agent. Attackers do not need direct access to files; a technique called the “bridging step” ties seemingly harmless queries to a harmful chain of reasoning. The poisoned memory persists even after subsequent normal sessions.

InjecMEM, submitted to ICLR 2026, achieves targeted memory poisoning in a single interaction. The payload is split into a retriever-agnostic anchor and a gradient-optimized trigger, functioning as a “sleeper agent” that activates only for specific queries. What makes it particularly troublesome is that once the attack succeeds, the poisoning does not disappear even if the agent is used benignly thereafter.

Indirect injection via content

Unit 42 at Palo Alto Networks demonstrated attacks against Amazon Bedrock Agents. Payloads embedded in web pages use forged XML tags to make the agent treat malicious content as system instructions. When the agent fetches a URL, the hidden instructions are incorporated into the session summary and persist via long-term memory—every time the agent accesses the web, the attack surface expands.

Large-scale poisoning via the supply chain: the ToxicSkills campaign

In the ToxicSkills campaign uncovered in February 2026, Snyk audited 3,984 agent skills from ClawHub and found some security issue in 36.82% (1,467), confirming 76 as malicious skills.

These skills combine traditional code exploits with prompt injection and write backdoors into identity files such as SOUL.md and MEMORY.md at install time. A key characteristic is that the file modifications remain even after uninstall, leaving an indelible trace of infection.

MMNTM’s analysis also points out the “Ship of Theseus” pattern. By accumulating incremental edits, an attacker can pass hash-based integrity checks while eventually rewriting the entire identity file into something else.

MCP tool poisoning adds to this. In the MCPTox benchmark (2026), instructions hidden in tool descriptions achieved a 72.8% attack success rate against o1-mini. Similar attacks via npm packages have also been confirmed; in the injection method called SANDWORM_MODE, the MCP injection technique performs an end‑to‑end chain from the npm supply chain to agent memory poisoning. Supply‑chain attacks targeting AI coding tools themselves, such as Clinejection, have also been reported. The Phantom Commit Injection, which abuses GitHub fork‑commit sharing, is another repository‑level poisoning technique to keep in mind.

Defense via database architecture

As countermeasures to such attacks, proposals include a memory…405 chars truncated… audit the skill supply chain (one in three public skills has issues) 5. Apply access control to identity files at the credentials level 6. Separate and sandbox processing of external content from memory operations


AI autonomously exploiting smart‑contract vulnerabilities: the shock of EVMbench

With the “AI gets attacked” side covered, next is the “AI attacks” side.

OpenAI and crypto VC Paradigm jointly released EVMbench—an open‑source benchmark that quantitatively evaluates an AI agent’s ability to detect, patch, and exploit vulnerabilities in Ethereum‑family smart contracts.

It was developed from the concern: “Over $100B in assets are stored on open‑source contracts. As LLMs rapidly improve their exploit‑discovery capabilities, we need to make that risk visible.”

What does the benchmark measure?

EVMbench consists of three task categories:

  • Detect: find vulnerabilities in a contract
  • Patch: fix the vulnerable code
  • Exploit: construct a transaction that actually exploits the vulnerability to drain funds

The dataset consists of real vulnerabilities collected from Code4rena’s open audits plus custom tasks from unpublished contracts (120 tasks in total, extracted from 40 audits). Each task is containerized, and agents operate in a realistic environment. A Rust harness deploys the contracts and deterministically reproduces and verifies the agent’s transactions.

Exploit success rate more than doubled in half a year

The numbers stood out.

ModelExploit Success Rate
Project start (~2025)Under 20%
GPT-531.9%
GPT-5.3-Codex72.2%

Roughly six months passed between GPT‑5 and GPT‑5.3‑Codex. During that period, the exploit success rate jumped from 31.9% to 72.2%—more than doubling. “The rate of improvement is incredible,” said Paradigm’s Alpin Yukseloglu.

Impact on DeFi security

Once deployed, smart contracts are basically immutable. If exploited, assets drain as they are. These results indicate that AI agents have reached a stage where they can autonomously exploit “high‑severity, fund‑draining bugs” with a success rate above 70%.

The targets are actual vulnerabilities found in Code4rena’s competitive audits; this is not synthetic academic data. Before attackers begin using such AI tools, the focus becomes how quickly defenders can roll out AI auditing. Around the same time, a 106‑country FortiGate scan combining DeepSeek and Claude was reported—embedding AI into attack infrastructure has already begun.

The benchmark, auditing agents, and datasets are all open‑source, and a joint academic paper by Paradigm and OpenAI was released concurrently.


The collapse of attack asymmetry

Placing the two trends side by side shows a shift away from the classical asymmetry where “attackers have the advantage.”

Defenders are also advancing AI‑driven vulnerability discovery—see, for example, Anthropic’s Claude Code Security finding 500+ previously unknown vulnerabilities in production OSS. Looking at the technical details of Claude Code Security shows that AI is starting to produce solid results as a defensive tool.

However, the two datasets here point beyond merely “using AI as a tool.” An agent’s memory itself becomes infected, and AI autonomously breaks smart contracts. Both are attack classes that emerge only after AI has settled in as infrastructure.

The ToxicSkills figure—one in three public skills has flaws—plainly shows that ecosystem maturity is outpacing security hardening. For design principles to run AI agents safely in production, see the article on production rollout principles, and for real incidents, see the analysis of failure modes.

The wave of vulnerabilities reported in February—Dell RecoverPoint’s CVSS 10.0 zero‑day, multiple CVEs added to CISA KEV, and pnpm’s lockfile bypass and RCE—also indicates that the set of targets AI agents can autonomously find and exploit continues to expand.


References: