9

Unifying Autoregressive and Diffusion-Based Sequence Generation
We take significant steps toward unifying autoregressive and diffusion-based sequence generation by extending the SEDD discrete …
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Understanding diverse web data and automating web development presents an exciting challenge for agentic AI. While existing benchmarks …
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision
This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to …
AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents
Recent advancements in large language models (LLMs) have spurred interest in developing autonomous agents capable of performing complex …
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their …
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Forecasting is a critical task in decision making across various domains. While numerical data provides a foundation, it often lacks …
Evaluating Interventional Reasoning Capabilities of Large Language Models
Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners …
Fast Convergence of Softmax Policy Mirror Ascent for Bandits & Tabular MDPs

We analyze the convergence of a novel policy gradient algorithm (referred to as SPMA) for multi-armed bandits and tabular Markov …

Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think
Recent advancements in large language models (LLMs) have sparked interest in developing autonomous web agents capable of performing …