ServiceNow AI Research

Agents

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data …
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks
The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though …
Representing Positional Information in Generative World Models for Object Manipulation
The ability to predict outcomes of interactions between embodied agents and objects is paramount in the robotic setting. While …
Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think
Recent advancements in large language models (LLMs) have sparked interest in developing autonomous web agents capable of performing …
An Ecosystem for Web Agents: WorkArena, BrowserGym, AgentLab and more
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those …
Multimodal foundation world models for generalist embodied agents
Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement …
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on …
Evaluating In-Context Learning of Libraries for Code Generation
Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly …
Efficient Dynamics Modeling in Interactive Environments with Koopman Theory
The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could …
IntentGPT: Few-shot Intent Discovery with Large Language Models
In today’s digitally driven world, dialogue systems play a pivotal role in enhancing user interactions, from customer service to …