Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning

Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Valenzano, Sheila A. McIlraith

July 2018

Abstract

In this paper we propose Reward Machines – a type of finite state machine that supports the spec- ification of reward functions while exposing re- ward function structure to the learner and support- ing decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal pol- icy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete do- mains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better poli- cies more quickly than hierarchical methods in a domain with a continuous state space.

Type

Conference paper

Publication

International Conference on Machine Learning (ICML)

Reinforcement Learning