Planning with Latent SImulated Trajectories

Alexandre Piche, Valentin Thomas, Cyril Ibrahim, Julien Cornebise, Christopher Pal

May 2019

Abstract

In this work, we draw connections between planning and latent variable models1. Specifically, planning can be seen as introducing latent future optimal trajectories to improve the estimation of the agent’s policy. This insight allows us to improve two model-based reinforcement learning (RL) algorithms: Cross Entropy Methods (CEM) and Sequential Monte Carlo Planning (SMCP). Finally, we demonstrate that our methods learn faster and achieve higher performance in early training on a continuous control benchmark.

Type

Workshop

Publication

Workshop at the International Conference on Learning Representations (ICLR)

Planning