How the Attack Works
Microsoft researchers have uncovered a method where attackers can manipulate AI agents by poisoning the descriptions of tools these agents rely on. The Model Context Protocol (MCP) allows AI agents to interact with external tools, and each tool includes a plain text description that tells the agent when and how to use it. An attacker can update a tool’s description with hidden instructions, such as grabbing and forwarding sensitive data, without changing the tool’s visible name or summary. The agent, unable to distinguish between legitimate instructions and malicious ones embedded in the description, follows the hidden orders. This means the agent remains compliant with rules, but the attacker can quietly exfiltrate data like invoices or internal files to an external server.
Impact and Defensive Measures
The attack exploits a trust gap between the AI agent and the tools it uses, not a flaw in the agent itself. Researchers demonstrated the technique with a finance agent handling invoices, where a poisoned description caused the agent to collect unpaid invoices and send them to the attacker. This class of attack has been documented since 2025, with examples like the postmark-mcp npm package that secretly BCC’d emails. Microsoft advises treating every connected tool as part of the supply chain, limiting agent permissions to specific tools, reviewing description changes like code changes, and requiring human approval for high-risk actions like data transfers or financial transactions. Logging agent activity and establishing behavioral baselines can help detect unusual patterns.
Source: The Hacker News

