Written by Hector Palacios, research scientist
The second edition of the Bridging the Gap Between AI Planning and Reinforcement Learning workshop was held during ICAPS 2021, the International Conference on Automated Planning and Scheduling. This was a joint workshop for AI researchers who work at the intersection of AI planning and reinforcement learning. The goal was to encourage discussion and collaboration between these two communities. Each focuses on sequential decision problems, but with different emphases and methods and little awareness of the other’s specific issues, techniques, methodologies, and evaluation protocols.
Before we jump into the workshop content, I’d like to provide a bit of background for anyone new to AI planning or reinforcement learning for intelligent decision-making in the context of enterprise AI research.
AI planning vs. reinforcement learning
On the one hand, the AI planning community aims to create algorithms to achieve defined goals in specific worlds. On the other hand, the reinforcement learning (RL) community seeks to develop learning algorithms that produce a policy with a low average error rate in future instances of a specific world, but with no guarantees for any instance since one instance could be an uncommon case that a "good" policy ignores.
RL algorithms require that the new instances be from the same world used for training, emphasizing the obtention of effective policies at the expense of specializing in a particular fixed world. In contrast, AI planning emphasizes flexibility, robustness, and adaptation to new instances and worlds at the expense of requiring a world description.
The state of the art in RL consists of algorithms that need applied research to be used in specific domains—sometimes with great success—such as AlphaGo, a superhuman player of the board game Go.
State-of-the-art algorithms in AI planning can obtain plans with thousands of actions for unseen world descriptions. For some problems, AI planning is particularly effective. For instance, the logistics domain is one of the many benchmarks used for evaluating planning techniques. The planning community relies on a standard language for describing the planning domain and problems: the Planning Domain Definition Language (PDDL). New domains are introduced during international planning competitions.
You can see and solve an instance of logistics using this example of online tool planning.domains, where you can see other instances of logistics and import other domains.
The same logistics example can be explored using Visual Studio Code. Check out the repo PDDLGym if you want to test RL algorithms in planning domains, both deterministic and with probabilistic effects. Documentation and educational material about planning are available at education.planning.domains and planning.wiki.
Why AI planning and RL matter for enterprise AI
First, although RL methods are becoming state of the art in many AI use cases and applications, they do not scale as well as AI planning in terms of generalization over a family of situations in the same domain, changes in the domain, and guarantees per instance. The relationship between learning and reasoning is at the core of this mismatch in scalability and calls for tighter integrations to overcome the weaknesses of each family of methods.
Second, ServiceNow offers customers a unified platform for enterprise AI. A team of human agents using the Now Platform can execute multiple actions to achieve their goals—e.g., to resolve a new IT incident. In such a scenario, AI planning and RL can provide AI methods to recommend sequences of actions that resolve such incidents. In contrast, standard machine learning methods might be unaware of the consequences of intermediate actions.
Bridging the gap between AI planning and RL
The organizers of the second edition of the AI planning and reinforcement learning (PRL) website accepted 25 of the 35 papers submitted and hosted five invited talks, 11 oral presentations of papers, a poster session, and five discussion sessions about topics transversal to the accepted papers. The recordings, papers, and posters are available on the PRL website. We are organizing a new edition of the PRL workshop.
Invited talks:
Danijar Hafner on “General Infomax Agents Through World Models” [Recording]
Aviv Tamar on “Learning to Plan and Planning to Learn” [Recording]
Emma Brunskill on “Careful Pessimism” [Recording]
André Barreto on “The Value Equivalence Principle for Model-Based Reinforcement Learning” [Recording]
Elias Bareinboim on “Towards Causal Reinforcement Learning” [Recording]
Discussion session topics included:
Abstractions in Planning & RL
Safe, Risk-Sensitive, and Robust Planning and RL
Domain Generalization in Planning and RL
The papers detailed the following problems and techniques:
Problems considered:
Techniques used:
Thank you to the co-organizers
We want to thank the co-organizers of the workshop: Hector Palacios from Element AI, a ServiceNow company; Vicenç Gómez and Anders Jonsson from Universitat Pompeu Fabra in Barcelona, Spain; Scott Sanner from the University of Toronto; Andrey Kolobov from Microsoft Research in Redmond, Washington; and Alan Fern from Oregon State University.
Related content
In addition to being the co-chair of the PRL workshop, Hector Palacios participated in the main track of ICAPS 2021 with an invited talk titled “Planning for Controlling Business-to-business Applications” (recording).
© 2021 ServiceNow, Inc. All rights reserved. ServiceNow, the ServiceNow logo, Now, and other ServiceNow marks are trademarks and/or registered trademarks of ServiceNow, Inc. in the United States and/or other countries. Other company names, product names, and logos may be trademarks of the respective companies with which they are associated.