Generative AI

We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially …

Tianyu Zhang, Suyuchen Wang, Lu Li, Ge Zhang, Perouz Taslakian, Sai Rajeswar Mudumba, Jie Fu, Bang Liu, Yoshua Bengio

International Conference of Learning Representations (ICLR), 2025.

StarVector: Generating Scalable Vector Graphics Code from Images and Text

Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation …

Juan A. Rodriguez, Abhay Puri, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, Sai Rajeswar Mudumba, David Vazquez, Christopher Pal, Marco Pedersoli

AAAI Demos, 2025.

AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents

Recent advancements in large language models (LLMs) have spurred interest in developing autonomous agents capable of performing complex …

Megh Thakkar, Léo Boisvert, Thibault Le Sellier De Chezelles, Alexandre Piche, Maxime Gasse, Alexandre Lacoste, Massimo Caccia

Workshop at the Neural Information Processing Systems (NeurIPS), 2024.

BigDocs: A Permissively-Licensed Dataset for Training Vision-Language Models on Document and Code Tasks

Vision and language models that can accurately understand both images and text are crucial for deeper document understanding. These …

Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi, Tianyu Zhang, Aarash Feizi, Abhay Puri, Akshay Kalkunte, Francois Savard, Amirhossein Abaskohi, Ahmed Masry, Shravan Nayak, Mahsa Massoud, Rabiul Awal, Pierre-André Noël, Mats L. Richter, Saverio Vadacchino, Shubham Agarwal, Sanket Biswas, Ying Zhang, Sathwik Tejaswi Madhusudhan, João Monteiro, Krishnamurthy (Dj) Dvijotham, Torsten Scholak, Nicolas Chapados, Sean Hughes, Tamer Özsu, Aishwarya Agrawal, Marco Pedersoli, Christopher Pal, Perouz Taslakian, David Vazquez, Issam H. Laradji, Spandana Gella, Sai Rajeswar Mudumba

Workshop at the Neural Information Processing Systems (NeurIPS), 2024.

Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think

Recent advancements in large language models (LLMs) have sparked interest in developing autonomous web agents capable of performing …

Massimo Caccia, Megh Thakkar, Léo Boisvert, Thibault Le Sellier De Chezelles, Alexandre Piche, Nicolas Chapados, Alexandre Drouin, Maxime Gasse, Alexandre Lacoste

Workshop at the Neural Information Processing Systems (NeurIPS), 2024.

Multimodal foundation world models for generalist embodied agents

Learning generalist agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning …

Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Aaron Courville, Sai Rajeswar Mudumba

Neural Information Processing Systems (NeurIPS), 2024.

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data …

João Monteiro, Pierre-André Noël, Étienne Marcotte, Sai Rajeswar Mudumba, Valentina Zantedeschi, David Vazquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian

NeurIPS Datasets and Benchmarks Track (NeurIPS Datasets), 2024.

WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks

The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though …

Léo Boisvert, Megh Thakkar, Maxime Gasse, Massimo Caccia, Thibault Le Sellier De Chezelles, Quentin Cappart, Nicolas Chapados, Alexandre Lacoste, Alexandre Drouin

NeurIPS Datasets and Benchmarks Track (NeurIPS Datasets), 2024.

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference …

João Monteiro, Étienne Marcotte, Pierre-André Noël, Valentina Zantedeschi, David Vazquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian

Workshop at the Neural Information Processing Systems (NeurIPS), 2024.

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Forecasting is a critical task in decision making across various domains. While numerical data provides a foundation, it often lacks …

Andrew Williams, Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Jithendaraa Subramanian, Roland Riachi, James Requeima, Alexandre Lacoste, Irina Rish, Nicolas Chapados, Alexandre Drouin

Foundation Models for Time Series, 2024.