Accueil
Équipe
Publications
Open Source
Démos
Évènements
Blog
Carrières
Nous joindre
Français
Français
English
ServiceNow
ServiceNow IA recherche
Tags
Safety and Security
ServiceNow IA recherche
Safety and Security
Attack What Matters: Integrating Expert Insight and Automation in Threat-Model-Aligned Red Teaming
Prompt injection attacks target a key vulnerability in modern large language models: their inability to reliably distinguish between …
Kiarash Mohammadi
,
Abhay Puri
,
Georges Belanger Albarran
,
Mihir Bansal
,
Navdeep Gill
,
Yanick Chénard
,
Segan Subramanian
,
Marc-Etienne Brunet
,
Jason Stanley
NOW AI, 2025.
Citation
Shifting AI Security to the Left: Design-Time Defenses to Mitigate the Risks of Prompt Injections
Prompt injections pose a critical weakness for modern Large Language Models, making it difficult for AI to distinguish between …
Abhay Puri
,
Kevin Kasa
,
Kiarash Mohammadi
,
Georges Belanger Albarran
,
Mihir Bansal
,
Yanick Chénard
,
Marc-Etienne Brunet
,
Jason Stanley
NOW AI, 2025.
Citation
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a …
Léo Boisvert
,
Mihir Bansal
,
Chandra Kiran Reddy Evuru
,
Gabriel Huang
,
Abhay Puri
,
Avinandan Bose
,
Maryam Fazel
,
Quentin Cappart
,
Jason Stanley
,
Alexandre Lacoste
,
Alexandre Drouin
,
Krishnamurthy (Dj) Dvijotham
Conference on Language Modeling (COLM), 2025.
PDF
Citation
Code
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a …
Léo Boisvert
,
Abhay Puri
,
Gabriel Huang
,
Mihir Bansal
,
Chandra Kiran Reddy Evuru
,
Avinandan Bose
,
Quentin Cappart
,
Maryam Fazel
,
Alexandre Lacoste
,
Alexandre Drouin
,
Jason Stanley
,
Krishnamurthy (Dj) Dvijotham
Workshop at the International Conference of Machine Learning (ICML), 2025.
PDF
Citation
Code
Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning
The rise of AI agents that can use tools, browse the web and interact with computers on behalf of a user, has sparked strong interest …
Léo Boisvert
,
Abhay Puri
,
Chandra Kiran Reddy Evuru
,
Joshua Kazdan
,
Avinandan Bose
,
Quentin Cappart
,
Maryam Fazel
,
Sai Rajeswar Mudumba
,
Jason Stanley
,
Nicolas Chapados
,
Alexandre Drouin
,
Krishnamurthy (Dj) Dvijotham
Workshop at the International Conference of Machine Learning (ICML), 2025.
PDF
Citation
No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Leading language model (LM) providers like OpenAI and Google offer fine-tuning APIs that allow customers to adapt LMs for specific use …
Joshua Kazdan
,
Krishnamurthy (Dj) Dvijotham
,
Sanmi Koyejo
Workshop at the International Conference of Learning Representation (ICLR), 2025.
PDF
Citation
Vidéo
Citation
×