ServiceNow recherche

Generative AI

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference …
Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think
Recent advancements in large language models (LLMs) have sparked interest in developing autonomous web agents capable of performing …
An Ecosystem for Web Agents: WorkArena, BrowserGym, AgentLab and more
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those …
Multimodal foundation world models for generalist embodied agents
Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement …
Evaluating In-Context Learning of Libraries for Code Generation
Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly …
Reducing hallucination in structured outputs via Retrieval-Augmented Generation
A common and fundamental limitation of Generative AI (GenAI) is its propensity to hallucinate. While large language models (LLM) have …
Exploring validation metrics for offline model-based optimisation with diffusion models
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward …
IntentGPT: Few-shot Intent Discovery with Large Language Models
In today’s digitally driven world, dialogue systems play a pivotal role in enhancing user interactions, from customer service to …
Self-evaluation and self-prompting to improve the reliability of LLMs
In order to safely deploy Large Language Models (LLMs), they must be capable of dynamically adapting their behavior based on their …
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on …