ServiceNow AI Research

Web Agents

How to Train Your LLM Web Agent: A Statistical Diagnosis

Large language model (LLM) agents for web interfaces have advanced rapidly, yet open-source systems still lag behind proprietary …

AgentLab Controller: Level Up Your Web Agent with Step-Through Debugging
Recent progress in building computer-using agents has enabled large language models to navigate browser environments and solve complex …
Hinting Around: Helping Web Agents Solve Tasks via Hints
While web agents offer an avenue to solve a plethora of tasks due to their ability to navigate the web, they are still brittle and …
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Understanding diverse web data and automating web development presents an exciting challenge for agentic multimodal models. While …
How to Train Your LLM Web Agent: A Statistical Diagnosis (Oral)

Large language model (LLM) agents for web interfaces have advanced rapidly, yet open-source systems still lag behind proprietary …

SafeArena: Evaluating the Safety of Autonomous Web Agents
LLM-based agents are becoming increasingly proficient at solving web-based tasks. With this capability comes a greater risk of misuse …
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Developing autonomous agents that can navigate diverse Graphical User Interfaces (GUIs) and solve complex tasks is essential for …
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Understanding diverse web data and automating web development presents an exciting challenge for agentic AI. While existing benchmarks …