1

Deployed machine learning systems require some mechanism to detect out-of-distribution (OOD) inputs. Existing research mainly focuses …

NeurIPS Datasets and Benchmarks Track (NeurIPS Datasets), 2024.

Learning generalist agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning …

Neural Information Processing Systems (NeurIPS), 2024.

Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data …

NeurIPS Datasets and Benchmarks Track (NeurIPS Datasets), 2024.

The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though …

NeurIPS Datasets and Benchmarks Track (NeurIPS Datasets), 2024.

This paper introduces a novel model compression approach through dynamic layer-specific pruning in Large Language Models (LLMs), …

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

Direct Preference Optimization (DPO) is an effective technique that leverages pairwise preference data (usually one chosen and rejected …

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference …

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those …

Montreal AI Symposium (MAIS), 2024.

Forecasting is a critical task in decision making across various domains. While numerical data provides a foundation, it often lacks …

Montreal AI Symposium (MAIS), 2024.

We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including …

Montreal AI Symposium (MAIS), 2024.