Unsupervised Model-based Pre-training for Data-efficient Reinforcement Learning from Pixels

Sai Rajeswar Mudumba, Pietro Mazzaglia, Tim Verbelen, Alexandre Piche, Aaron Courville, Alexandre Lacoste

July 2022

Abstract

Reinforcement learning (RL) aims at autonomously performing complex tasks. To this end, a reward signal is used to steer the learning process. While successful in many circumstances, the approach is typically data-hungry, requiring large amounts of task-specific interaction between agent and environment to learn efficient behaviors. To alleviate this, unsupervised RL proposes to collect data through self-supervised interaction to accelerate task-specific adaptation. However, whether current unsupervised strategies lead to improved generalization capabilities is still unclear, more so when the input observations are high-dimensional. In this work, we advance the field by closing the performance gap in the Unsupervised RL Benchmark, a collection of tasks to be solved in a data-efficient manner, after interacting with the environment in a self-supervised way. Our approach uses unsupervised exploration for collecting experience to pre-train a world model. Then, when fine-tuning for downstream tasks, the agent leverages the learned model and a hybrid planner to efficiently adapt for the given tasks, achieving comparable results to task-specific baselines, while using 20x less data. We extensively evaluate our work, comparing several exploration methods and improving the fine-tuning process by studying the interactions between the learned components. Furthermore, we investigate the limitations of the pre-trained agent, gaining insights into how these influence the decision process and shedding light on new research directions.

Type

Workshop

Publication

Workshop at the International Conference on Machine Learning (ICML)