1

BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation

Neural sentence embedding models for dense retrieval typically rely on binary relevance labels, treating query-document pairs as …

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a …
Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts
Merging parameter-efficient task experts has recently gained growing attention as a way to build modular architectures that can be …
Unifying Autoregressive and Diffusion-Based Sequence Generation
We present significant extensions to diffusion-based sequence generation models, blurring the line with autoregressive language models. …
Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows
Large Language Models (LLMs) such as GPT-4o can handle a wide range of complex tasks with the right prompt. As per token costs are …
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Retrieval-Augmented Generation (RAG) has become ubiquitous when deploying Large Language Models (LLMs), as it can address typical …
Understanding the Influence of Synthetic Data for Text Embedders
Recent progress in developing general-purpose text embedders has been driven by training on synthetic LLM-generated data. Nonetheless, …
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Forecasting is a critical task in decision-making across numerous domains. While historical numerical data provide a start, they fail …
Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks
Both PAC-Bayesian and Sample Compress learning frameworks have been shown instrumental for deriving tight (non-vacuous) generalization …
SafeArena: Evaluating the Safety of Autonomous Web Agents
LLM-based agents are becoming increasingly proficient at solving web-based tasks. With this capability comes a greater risk of misuse …