ServiceNow Research

Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning


In this paper we propose Reward Machines – a type of finite state machine that supports the spec- ification of reward functions while exposing re- ward function structure to the learner and support- ing decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal pol- icy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete do- mains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better poli- cies more quickly than hierarchical methods in a domain with a continuous state space.

International Conference on Machine Learning (ICML)