Poisoned Tool Descriptions Can Hijack AI Agents to Exfiltrate Data

Attackers can embed hidden commands in tool descriptions that trick AI agents into sending sensitive data to external servers without triggering alerts.

CSBadmin
2 Min Read

How the Attack Works

Microsoft researchers have uncovered a method where attackers can manipulate AI agents by poisoning the descriptions of tools these agents rely on. The Model Context Protocol (MCP) allows AI agents to interact with external tools, and each tool includes a plain text description that tells the agent when and how to use it. An attacker can update a tool’s description with hidden instructions, such as grabbing and forwarding sensitive data, without changing the tool’s visible name or summary. The agent, unable to distinguish between legitimate instructions and malicious ones embedded in the description, follows the hidden orders. This means the agent remains compliant with rules, but the attacker can quietly exfiltrate data like invoices or internal files to an external server.

Impact and Defensive Measures

The attack exploits a trust gap between the AI agent and the tools it uses, not a flaw in the agent itself. Researchers demonstrated the technique with a finance agent handling invoices, where a poisoned description caused the agent to collect unpaid invoices and send them to the attacker. This class of attack has been documented since 2025, with examples like the postmark-mcp npm package that secretly BCC’d emails. Microsoft advises treating every connected tool as part of the supply chain, limiting agent permissions to specific tools, reviewing description changes like code changes, and requiring human approval for high-risk actions like data transfers or financial transactions. Logging agent activity and establishing behavioral baselines can help detect unusual patterns.

Source: The Hacker News

CSBadmin

The latest in cybersecurity news and updates.

TAGGED:
Share This Article
Follow:
The latest in cybersecurity news and updates.