In this work, we draw connections between planning and latent variable models1. Specifically, planning can be seen as introducing latent future optimal trajectories to improve the estimation of the agent’s policy. This insight allows us to improve two model-based reinforcement learning (RL) algorithms: Cross Entropy Methods (CEM) and Sequential Monte Carlo Planning (SMCP). Finally, we demonstrate that our methods learn faster and achieve higher performance in early training on a continuous control benchmark.