ServiceNow Research

Using Confounded Data in Offline RL

Abstract

In this work we consider the problem of confounding in offline RL, also referred to as the delusion problem. While it is known that learning from purely offline data is a hazardous endeavor in the presence of confounding, in this paper we show that offline, confounded data can be safely combined with online, non-confounded data to improve the sample-efficiency of model-based RL. We import ideas from the well-established framework of $do$-calculus to express model-based RL as a causal inference problem, thus bridging the fields of RL and causality. We propose a latent-based method which we prove is correct and efficient, in the sense that it attains better generalization guarantees thanks to the offline, confounded data (in the asymptotic case), regardless of the expert’s behavior. We illustrate the effectiveness of our method on a series of synthetic experiments.

Publication
Workshop at the Neural Information Processing Systems (NeurIPS)
Maxime Gasse
Maxime Gasse
Research Scientist

Research Scientist at Human Decision Support located at Montreal, QC, Canada.