About
People
Publications
Open source
Demos
Events
Blog
Careers
Contact
English
English
Français
ServiceNow
ServiceNow AI Research
Tags
Cybersecurity
ServiceNow AI Research
Cybersecurity
No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms
Leading language model (LM) providers like OpenAI and Anthropic allow customers to fine-tune frontier LMs for specific use cases. To …
Joshua Kazdan
,
Abhay Puri
,
Rylan Schaeffer
,
Lisa Yu
,
Chris Cundy
,
Jason Stanley
,
Sanmi Koyejo
,
Krishnamurthy (Dj) Dvijotham
International Conference on Learning Representations, 2026.
PDF
Cite
Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning
The rise of AI agents that can use tools, browse the web and interact with computers on behalf of a user, has sparked strong interest …
Léo Boisvert
,
Abhay Puri
,
Chandra Kiran Reddy Evuru
,
Joshua Kazdan
,
Avinandan Bose
,
Quentin Cappart
,
Maryam Fazel
,
Sai Rajeswar Mudumba
,
Jason Stanley
,
Nicolas Chapados
,
Alexandre Drouin
,
Krishnamurthy (Dj) Dvijotham
Workshop at the International Conference of Machine Learning (ICML), 2025.
PDF
Cite
Cite
×