1

Multi-task retriever fine-tuning for domain-specific and efficient RAG
Retrieval-Augmented Generation (RAG) has become ubiquitous when deploying Large Language Models (LLMs), as it can address typical …
Understanding the Influence of Synthetic Data for Text Embedders
Recent progress in developing general-purpose text embedders has been driven by training on synthetic LLM-generated data. Nonetheless, …
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Forecasting is a critical task in decision-making across numerous domains. While historical numerical data provide a start, they fail …
Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks
Both PAC-Bayesian and Sample Compress learning frameworks have been shown instrumental for deriving tight (non-vacuous) generalization …
SafeArena: Evaluating the Safety of Autonomous Web Agents
LLM-based agents are becoming increasingly proficient at solving web-based tasks. With this capability comes a greater risk of misuse …
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Developing autonomous agents that can navigate diverse Graphical User Interfaces (GUIs) and solve complex tasks is essential for …
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation …
The Landscape of Causal Discovery Data: Grounding Causal Discovery in Real-World Applications
Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many …
Fast Convergence of Softmax Policy Mirror Ascent

We analyze the convergence of a novel policy gradient algorithm (referred to as SPMA) for multi-armed bandits and tabular Markov …

Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning
The rise of foundation models fine-tuned on human feedback from potentially untrusted users has increased the risk of adversarial data …