1

Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection
Deployed machine learning systems require some mechanism to detect out-of-distribution (OOD) inputs. Existing research mainly focuses …
Multimodal foundation world models for generalist embodied agents
Learning generalist agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning …
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data …
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks
The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though …
Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy
This paper introduces a novel model compression approach through dynamic layer-specific pruning in Large Language Models (LLMs), …
Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences
Direct Preference Optimization (DPO) is an effective technique that leverages pairwise preference data (usually one chosen and rejected …
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference …
An Ecosystem for Web Agents: WorkArena, BrowserGym, AgentLab and more
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those …
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Forecasting is a critical task in decision making across various domains. While numerical data provides a foundation, it often lacks …
TACTIS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series
We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including …