ServiceNow Research

A Probabilistic Perspective on Reinforcement Learning via Supervised Learning

Abstract

Reinforcement Learning via Supervised Learning (RvS) only uses supervised techniques to learn desirable behaviors from large datasets. RvS has attracted much attention lately due to its simplicity and ability to leverage diverse trajectories. We introduce Density to Decision (D2D), a new framework, to unify a myriad of RvS algorithms. The Density to Decision framework formulates RvS as a two-step process: i) density estimation via supervised learning and ii) decision making via exponential tilting of the density. Using our framework, we categorise popular RvS algorithms and show how they are different by the design choices in their implementation. We then introduce a novel algorithm, Implicit RvS, leveraging powerful density estimation techniques that can easily be tilted to produce desirable behaviors. We compare the performance of a suite of RvS algorithms on the D4RL benchmark. Finally, we highlight the limitations of current RvS algorithms in comparison with traditional RL ones.

Publication
Workshop at the International Conference on Learning Representations (ICLR)
Alexandre Piche
Alexandre Piche
Research Scientist

Research Scientist at Human Decision Support located at Montreal, QC, Canada.

Rafael Pardinas
Rafael Pardinas
Applied Research Scientist

Applied Research Scientist at Human Decision Support located at London, UK.

David Vazquez
David Vazquez
Manager of Research Programs

Manager of Research Programs at Research Management located at Montreal, QC, Canada.

Christopher Pal
Christopher Pal
Distinguished Scientist

Distinguished Scientist at Low Data Learning located at Montreal, QC, Canada.