Large Language Models

Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models

Abstention Ability (AA) is a critical aspect of Large Language Model (LLM) reliability, referring to an LLM’s capability to …

Nishanth Madhusudhan, Sathwik Tejaswi Madhusudhan, Vikas Yadav, Masoud Hashemi

International Conference on Computational Linguistics (COLING), 2025.

AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents

Recent advancements in large language models (LLMs) have spurred interest in developing autonomous agents capable of performing complex …

Megh Thakkar, Léo Boisvert, Thibault Le Sellier De Chezelles, Alexandre Piche, Maxime Gasse, Alexandre Lacoste, Massimo Caccia

Workshop at the Neural Information Processing Systems (NeurIPS), 2024.

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Forecasting is a critical task in decision making across various domains. While numerical data provides a foundation, it often lacks …

Andrew Williams, Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Jithendaraa Subramanian, Roland Riachi, James Requeima, Alexandre Lacoste, Irina Rish, Nicolas Chapados, Alexandre Drouin

Workshop at the Neural Information Processing Systems (NeurIPS), 2024.

Evaluating Interventional Reasoning Capabilities of Large Language Models

Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners …

Tejas Kasetty, Divyat Mahajan, Gintare Karolina Dziugaite, Alexandre Drouin, Dhanya Sridhar

Workshop at the Neural Information Processing Systems (NeurIPS), 2024.

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data …

João Monteiro, Pierre-André Noël, Étienne Marcotte, Sai Rajeswar Mudumba, Valentina Zantedeschi, David Vazquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian

NeurIPS Datasets and Benchmarks Track (NeurIPS Datasets), 2024.

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference …

João Monteiro, Étienne Marcotte, Pierre-André Noël, Valentina Zantedeschi, David Vazquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian

Workshop at the Neural Information Processing Systems (NeurIPS), 2024.

Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy

This paper introduces a novel model compression approach through dynamic layer-specific pruning in Large Language Models (LLMs), …

Razvan-Gabriel Dumitru, Paul-Ioan Clotan, Vikas Yadav, Darius Peteleaza, Mihai Surdeanu

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences

Direct Preference Optimization (DPO) is an effective technique that leverages pairwise preference data (usually one chosen and rejected …

Pulkit Pattnaik, Rishabh Maheshwary, Kelechi Ogueji, Vikas Yadav, Sathwik Tejaswi Madhusudhan

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference …

João Monteiro, Étienne Marcotte, Pierre-André Noël, Valentina Zantedeschi, David Vazquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

An Ecosystem for Web Agents: WorkArena, BrowserGym, AgentLab and more

The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those …

Alexandre Lacoste, Maxime Gasse, Thibault Le Sellier De Chezelles, Massimo Caccia, Léo Boisvert, Megh Thakkar, Alexandre Drouin, Nicolas Chapados

Montreal AI Symposium (MAIS), 2024.