ServiceNow recherche

WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?

Résumé

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuring the agents’ ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 29 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

Publication
Workshop at the International Conference of Learning Representation (ICLR)
Alexandre Drouin
Alexandre Drouin
Head of AI Frontier Research​

Head of AI Frontier Research​ at AI Frontier Research located at Montreal, QC, Canada.

Massimo Caccia
Massimo Caccia
Research Scientist

Research Scientist at AI Frontier Research located at Montreal, QC, Canada.

Issam H. Laradji
Issam H. Laradji
Research Manager

Research Manager at AI Frontier Research located at Vancouver, BC, Canada.

David Vazquez
David Vazquez
Director of AI Research

Director of AI Research at AI Research Management located at Montreal, QC, Canada.

Nicolas Chapados
Nicolas Chapados
VP of Research

VP of Research at AI Research Management located at Montreal, QC, Canada.

Alexandre Lacoste
Alexandre Lacoste
Research Lead

Research Lead at AI Frontier Research located at Montreal, QC, Canada.