Countering Language Drift with KL Regularization

Issam H. Laradji, Michael Noukhovitch, Aaron Courville

October 2022

Abstract

End-to-end interactive learning of dialogue systems has been all-but-abandoned in favour of other approaches using more labelled data, such as dialogue state tracking. A major issue of the approach is that using language models as speaker and listener can lead to language drift.'' Models are trained only to optimize a task objective and so their intermediate language can drift from pretrained natural language to an un-natural communication protocol. We reproduce previous work on tackling this phenomena and find that baseline methods are not as bad as reported. Furthermore, we use a simple KL regularization with an EMA model to stabilize RL training and outperform previous methods. Finally, we investigate the issue of language drift’’ and find that it focuses only on the sender. We argue that ``receiver drift’’ is equally important and show strong results on this novel metric.

Type

Workshop

Publication

Workshop on Interactive Learning for Natural Language Processing (NeurIPS Workshop)

Natural Language Processing

Issam H. Laradji

Research Manager

Research Manager at Frontier AI Research located at Vancouver, BC, Canada.