ServiceNow AI Research

Agents

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Data analytics is essential for extracting valuable insights from data that can assist organizations in making effective decisions. We …
MMTEB: Massive Multilingual Text Embedding Benchmark

Text embeddings are typically evaluated on a narrow set of tasks, limited in terms of languages, domains, and task types. To circumvent …

LitLLMs, LLMs for Literature Review: Are We There Yet?

Literature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, …

StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation …
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models
Abstention Ability (AA) is a critical aspect of Large Language Model (LLM) reliability, referring to an LLM’s capability to …
The BrowserGym Ecosystem for Web Agent Research
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those …
AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents
Recent advancements in large language models (LLMs) have spurred interest in developing autonomous agents capable of performing complex …
Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think
Recent advancements in large language models (LLMs) have sparked interest in developing autonomous web agents capable of performing …
Multimodal foundation world models for generalist embodied agents
Learning generalist agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning …