About
People
Publications
Open Source
Demos
Events
Blog
Careers
Contact
English
English
Français
ServiceNow
ServiceNow Research
Tags
Safety and Security
ServiceNow Research
Safety and Security
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a …
Léo Boisvert
,
Abhay Puri
,
Gabriel Huang
,
Avinandan Bose
,
Alexandre Drouin
,
Alexandre Lacoste
,
Krishnamurthy (Dj) Dvijotham
,
Chandra Kiran Reddy Evuru
,
Maryam Fazel
,
Quentin Cappart
,
Jason Stanley
,
Mihir Bansal
Conference on Language Modeling (COLM), 2025.
PDF
Cite
Code
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a …
Léo Boisvert
,
Abhay Puri
,
Gabriel Huang
,
Mihir Bansal
,
Chandra Kiran Reddy Evuru
,
Avinandan Bose
,
Quentin Cappart
,
Maryam Fazel
,
Alexandre Lacoste
,
Alexandre Drouin
,
Jason Stanley
,
Krishnamurthy (Dj) Dvijotham
Workshop at the International Conference of Machine Learning (ICML), 2025.
PDF
Cite
Code
Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning
The rise of AI agents that can use tools, browse the web and interact with computers on behalf of a user, has sparked strong interest …
Léo Boisvert
,
Abhay Puri
,
Chandra Kiran Reddy Evuru
,
Joshua Kazdan
,
Avinandan Bose
,
Quentin Cappart
,
Maryam Fazel
,
Sai Rajeswar Mudumba
,
Jason Stanley
,
Nicolas Chapados
,
Alexandre Drouin
,
Krishnamurthy (Dj) Dvijotham
Workshop at the International Conference of Machine Learning (ICML), 2025.
PDF
Cite
No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Leading language model (LM) providers like OpenAI and Google offer fine-tuning APIs that allow customers to adapt LMs for specific use …
Joshua Kazdan
,
Krishnamurthy (Dj) Dvijotham
,
Sanmi Koyejo
Workshop at the International Conference of Learning Representation (ICLR), 2025.
PDF
Cite
Video
Cite
×