9

Beyond Naïve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs
Forecasting in real-world settings requires models to integrate not only historical data but also relevant contextual information, …
GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities
The rapid evolution of software libraries presents a significant challenge for code generation models, which must adapt to frequent …
How to Train Your LLM Web Agent: A Statistical Diagnosis

Large language model (LLM) agents for web interfaces have advanced rapidly, yet open-source systems still lag behind proprietary …

Using Scaling Laws for Data Source Utility Estimation in Domain-Specific Pre-Training
We introduce a framework for optimizing domain-specific dataset construction in foundation model training. Specifically, we seek a …
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
We introduce AgentAda, the first LLM-powered analytics agent that can learn and use new analytics skills to extract more specialized …
Adaptive Diffusion Denoised Smoothing : Certified Robustness via Randomized Smoothing with Differentially Private Guided Denoising Diffusion
We propose Adaptive Diffusion Denoised Smoothing, a method for certifying the predictions of a vision model against adversarial …
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a …
How to Train Your LLM Web Agent: A Statistical Diagnosis (Oral)

Large language model (LLM) agents for web interfaces have advanced rapidly, yet open-source systems still lag behind proprietary …

Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning
The rise of AI agents that can use tools, browse the web and interact with computers on behalf of a user, has sparked strong interest …
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Understanding diverse web data and automating web development presents an exciting challenge for agentic AI. While existing benchmarks …