Adversarial Attacks

The rise of foundation models fine-tuned on human feedback from potentially untrusted users has increased the risk of adversarial data …

International Conference on Artificial Intelligence and Statistics (AISTATS), 2025.

A well-known failure mode of neural networks is that they may confidently return erroneous predictions. Such unsafe behaviour is …

International Conference of Learning Representations (ICLR), 2023.

The Jacobian-based Saliency Map Attack is a family of adversarial attack methods for fooling classification models, such as deep neural …

Rey Reza Wiyatno, Anqi Xu

Montreal AI Symposium (MAIS), 2018.