About
People
Publications
Open Source
Demos
Events
Blog
Careers
Contact
English
English
Français
ServiceNow
ServiceNow AI Research
Tags
Safety and Security
ServiceNow AI Research
Safety and Security
Attack What Matters: Integrating Expert Insight and Automation in Threat-Model-Aligned Red Teaming
Prompt injection attacks target a key vulnerability in modern large language models: their inability to reliably distinguish between …
Kiarash Mohammadi
,
Abhay Puri
,
Georges Belanger Albarran
,
Mihir Bansal
,
Navdeep Gill
,
Yanick Chénard
,
Segan Subramanian
,
Marc-Etienne Brunet
,
Jason Stanley
NOW AI, 2025.
Cite
Shifting AI Security to the Left: Design-Time Defenses to Mitigate the Risks of Prompt Injections
Prompt injections pose a critical weakness for modern Large Language Models, making it difficult for AI to distinguish between …
Abhay Puri
,
Kevin Kasa
,
Kiarash Mohammadi
,
Georges Belanger Albarran
,
Mihir Bansal
,
Yanick Chénard
,
Marc-Etienne Brunet
,
Jason Stanley
NOW AI, 2025.
Cite
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a …
Léo Boisvert
,
Mihir Bansal
,
Chandra Kiran Reddy Evuru
,
Gabriel Huang
,
Abhay Puri
,
Avinandan Bose
,
Maryam Fazel
,
Quentin Cappart
,
Jason Stanley
,
Alexandre Lacoste
,
Alexandre Drouin
,
Krishnamurthy (Dj) Dvijotham
Conference on Language Modeling (COLM), 2025.
PDF
Cite
Code
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a …
Léo Boisvert
,
Abhay Puri
,
Gabriel Huang
,
Mihir Bansal
,
Chandra Kiran Reddy Evuru
,
Avinandan Bose
,
Quentin Cappart
,
Maryam Fazel
,
Alexandre Lacoste
,
Alexandre Drouin
,
Jason Stanley
,
Krishnamurthy (Dj) Dvijotham
Workshop at the International Conference of Machine Learning (ICML), 2025.
PDF
Cite
Code
Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning
The rise of AI agents that can use tools, browse the web and interact with computers on behalf of a user, has sparked strong interest …
Léo Boisvert
,
Abhay Puri
,
Chandra Kiran Reddy Evuru
,
Joshua Kazdan
,
Avinandan Bose
,
Quentin Cappart
,
Maryam Fazel
,
Sai Rajeswar Mudumba
,
Jason Stanley
,
Nicolas Chapados
,
Alexandre Drouin
,
Krishnamurthy (Dj) Dvijotham
Workshop at the International Conference of Machine Learning (ICML), 2025.
PDF
Cite
No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Leading language model (LM) providers like OpenAI and Google offer fine-tuning APIs that allow customers to adapt LMs for specific use …
Joshua Kazdan
,
Krishnamurthy (Dj) Dvijotham
,
Sanmi Koyejo
Workshop at the International Conference of Learning Representation (ICLR), 2025.
PDF
Cite
Video
Cite
×