ServiceNow Research

Planning with Latent SImulated Trajectories

Abstract

In this work, we draw connections between planning and latent variable models1. Specifically, planning can be seen as introducing latent future optimal trajectories to improve the estimation of the agent’s policy. This insight allows us to improve two model-based reinforcement learning (RL) algorithms: Cross Entropy Methods (CEM) and Sequential Monte Carlo Planning (SMCP). Finally, we demonstrate that our methods learn faster and achieve higher performance in early training on a continuous control benchmark.

Publication
Workshop at the International Conference on Learning Representations (ICLR)
Alexandre Piche
Alexandre Piche
Research Scientist

Research Scientist at Human Machine Interaction Through Language located at Montreal, QC, Canada.

Christopher Pal
Christopher Pal
Distinguished Scientist

Distinguished Scientist at Low Data Learning located at Montreal, QC, Canada.