ServiceNow recherche

WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks

Résumé

The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though recent LLMs seem capable of planning and reasoning given user instructions, their effectiveness in applying these capabilities for autonomous task solving remains underexplored. This is especially true in enterprise settings, where automated agents hold the promise of a high impact. To fill this gap, we propose WorkArena++, a novel benchmark consisting of 682 tasks corresponding to realistic workflows routinely performed by knowledge workers. WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents. Our empirical studies across state-of-the-art LLMs and vision-language models (VLMs), as well as human workers, reveal several challenges for such models to serve as useful assistants in the workplace. In addition to the benchmark, we provide a mechanism to effortlessly generate thousands of ground-truth observation/action traces, which can be used for fine-tuning existing models. Overall, we expect this work to serve as a useful resource to help the community progress toward capable autonomous agents. The benchmark can be found at https://github.com/ServiceNow/WorkArena/tree/workarena-plus-plus

Publication
NeurIPS Datasets and Benchmarks Track (NeurIPS Datasets)
Léo	Boisvert
Léo Boisvert
Visiting Researcher

Visiting Researcher at AI Frontier Research located at Montreal, QC, Canada.

Massimo Caccia
Massimo Caccia
Research Scientist

Research Scientist at AI Frontier Research located at Montreal, QC, Canada.

Nicolas Chapados
Nicolas Chapados
VP of Research

VP of Research at AI Research Management located at Montreal, QC, Canada.

Alexandre Lacoste
Alexandre Lacoste
Research Lead

Research Lead at AI Frontier Research located at Montreal, QC, Canada.

Alexandre Drouin
Alexandre Drouin
Head of AI Frontier Research​

Head of AI Frontier Research​ at AI Frontier Research located at Montreal, QC, Canada.