ServiceNow IA recherche

Revisiting Fine-Tuning for Task-Oriented Dialogues With ABCD 2.0

Résumé

Frontier grade Large Language Models (LLMs) that have been few shot prompted to engage in dialogues with users seeking to accomplish real-world tasks still suffer from poor performance. To better understand the behaviour of such dialogue systems, we have developed the Action-Based Conversation Dataset (ABCD) 2.0, an enhanced task-oriented dialogue dataset designed to improve upon the original ABCD. ABCD comprises conversations in a customer service context, where an agent must maintain an accurate dialogue state while acting within the constraints imposed by a set of policy guidelines. ABCD 2.0 extends this work by explicitly annotating slots and values to align with standard dialogue state tracking benchmarks, as well as providing previously missing details to the knowledge base. In our experimental work, we examine the performance of four few-shot prompted 70B parameter or larger LLMs and four smaller, 3B or fewer parameter models. Additionally, we establish updated baselines and new evaluation metrics that were not possible with the original ABCD, including: Dialogue State Tracking (DST), Action State Tracking (AST), and Workflow Planning (WP). Interestingly, our results reveal that smaller fine-tuned models can consistently outperform larger models based on few-shot prompting in all tasks, highlighting the room for improvement in building generalized dialogue systems.

Publication
NOW AI
Christopher Pal
Christopher Pal
Distinguished Scientist

Distinguished Scientist at AI Research Partnerships & Ecosystem​ located at Montreal, QC, Canada.

Stefania Raimondo
Stefania Raimondo
Research Manager

Research Manager at AI Research Deployment​ located at Toronto, ON, Canada.