ServiceNow Research

Unsupervised Model-based Pre-training for Data-efficient Reinforcement Learning from Pixels

Abstract

Reinforcement learning (RL) aims at autonomously performing complex tasks. To this end, a reward signal is used to steer the learning process. While successful in many circumstances, the approach is typically data-hungry, requiring large amounts of task-specific interaction between agent and environment to learn efficient behaviors. To alleviate this, unsupervised RL proposes to collect data through self-supervised interaction to accelerate task-specific adaptation. However, whether current unsupervised strategies lead to improved generalization capabilities is still unclear, more so when the input observations are high-dimensional. In this work, we advance the field by closing the performance gap in the Unsupervised RL Benchmark, a collection of tasks to be solved in a data-efficient manner, after interacting with the environment in a self-supervised way. Our approach uses unsupervised exploration for collecting experience to pre-train a world model. Then, when fine-tuning for downstream tasks, the agent leverages the learned model and a hybrid planner to efficiently adapt for the given tasks, achieving comparable results to task-specific baselines, while using 20x less data. We extensively evaluate our work, comparing several exploration methods and improving the fine-tuning process by studying the interactions between the learned components. Furthermore, we investigate the limitations of the pre-trained agent, gaining insights into how these influence the decision process and shedding light on new research directions.

Publication
Workshop at the International Conference on Machine Learning (ICML)
Sai Rajeswar Mudumba
Sai Rajeswar Mudumba
Research Scientist

Research Scientist at Human Decision Support located at Montreal, QC, Canada.

Alexandre Piche
Alexandre Piche
Research Scientist

Research Scientist at Human Decision Support located at Montreal, QC, Canada.

Alexandre Lacoste
Alexandre Lacoste
Research Scientist

Research Scientist at Human Decision Support located at Montreal, QC, Canada.