1

Faster On-Policy Reinforcement Learning for Long Sequence Generation
Reinforcement Learning (RL) is increasingly utilized to enhance the reasoning capabilities of Large Language Models (LLMs). However, …
FM2DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering
We introduce DRBench, a benchmark for evaluating AI agents on complex, open-ended enterprise deep research tasks. Unlike existing …
Hinting Around: Helping Web Agents Solve Tasks via Hints
While web agents offer an avenue to solve a plethora of tasks due to their ability to navigate the web, they are still brittle and …
Revisiting Fine-Tuning for Task-Oriented Dialogues With ABCD 2.0
Frontier grade Large Language Models (LLMs) that have been few shot prompted to engage in dialogues with users seeking to accomplish …
Shifting AI Security to the Left: Design-Time Defenses to Mitigate the Risks of Prompt Injections
Prompt injections pose a critical weakness for modern Large Language Models, making it difficult for AI to distinguish between …
StarVLM ReRank: Better UI Grounding via Enhanced Visual Input and Element Position Perception
UI grounding is a fundamental task for enterprise workflow automation. This task maps natural language instructions to precise pixel …
Unifying Autoregressive and Diffusion-Based Sequence Generation
We present significant extensions to diffusion-based language models, blurring the line with autoregressive ones. We introduce …