ServiceNow recherche

Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning

Résumé

The rise of AI agents that can use tools, browse the web and interact with computers on behalf of a user, has sparked strong interest in improving these capabilities by explicitly fine-tuning the LLMs/VLMs that power these agents. Several researchers have proposed collecting data by letting the agents interact with their environment (e.g., a computer operating system, the web or a collection of APIs exposed as tools), and improve agent performance by fine tuning on this data. In this work, we show that such data collection can be manipulated by adversaries to insert poisoned traces. By modifying just 5% of collected traces, adversaries can embed stealthy bad behaviors into agents—like leaking confidential user information whenever the tool or webpage exposes a trigger. Our results raise important security concerns in the development of AI agents, and underscore the importance of careful scrutiny of all data collection processes used to improve agentic AI.

Publication
Workshop at the International Conference of Machine Learning (ICML)
Léo	Boisvert
Léo Boisvert
Visiting Researcher

Visiting Researcher at AI Frontier Research located at Montreal, QC, Canada.

Abhay Puri
Abhay Puri
Applied Research Scientist

Applied Research Scientist at AI Research Deployment​ located at Montreal, QC, Canada.

Jason Stanley
Jason Stanley
Head of AI Research Deployment​

Head of AI Research Deployment​ at AI Research Deployment​ located at Montreal, QC, Canada.

Nicolas Chapados
Nicolas Chapados
VP of Research

VP of Research at AI Research Management located at Montreal, QC, Canada.

Alexandre Drouin
Alexandre Drouin
Head of AI Frontier Research​

Head of AI Frontier Research​ at AI Frontier Research located at Montreal, QC, Canada.