ServiceNow Research

Using Confounded Data in Latent Model-Based Reinforcement Learning


In the presence of confounding, naively using off-the-shelf offline reinforcement learning (RL) algorithms leads to sub-optimal behaviour. In this work, we propose a safe method to exploit confounded offline data in model-based RL, which improves the sample-efficiency of an interactive agent that collects and learns from online, unconfounded data. First, we import ideas from the well-established framework of do-calculus to express model-based RL as a causal inference problem, thus bridging the gap between the fields of RL and causality. Then, we propose a generic method for learning a causal transition model from offline and online data, which captures and corrects the confounding effect using a hidden latent variable. We prove that our method is correct and efficient, in the sense that it attains better generalization guarantees thanks to the confounded offline data (in the asymptotic case), regardless of the confounding effect (the offline expert’s behaviour). We showcase our method on a series of synthetic experiments, which demonstrate that a) using confounded offline data naively degrades the sample-efficiency of an RL agent collecting and learning from online data; b) using confounded offline data correctly improves its sample-efficiency.

Transactions on Machine Learning Research
Maxime Gasse
Maxime Gasse
Research Scientist

Research Scientist at Human Decision Support located at Montreal, QC, Canada.