Planning and Reinforcement Learning workshop at ICAPS 2021

  • ServiceNow Research
  • 2021
December 17, 2021

Bridging the gap between AI Planning and Reinforcement Learning - ICAPS 2021 Planning and Reinforcement Learning workshop

Written by Hector Palacios, research scientist

The second edition of the Bridging the Gap Between AI Planning and Reinforcement Learning workshop was held during ICAPS 2021, the International Conference on Automated Planning and Scheduling. This was a joint workshop for AI researchers who work at the intersection of AI planning and reinforcement learning. The goal was to encourage discussion and collaboration between these two communities. Each focuses on sequential decision problems, but with different emphases and methods and little awareness of the other’s specific issues, techniques, methodologies, and evaluation protocols.

Before we jump into the workshop content, I’d like to provide a bit of background for anyone new to AI planning or reinforcement learning for intelligent decision-making in the context of enterprise AI research.

AI planning vs. reinforcement learning

On the one hand, the AI planning community aims to create algorithms to achieve defined goals in specific worlds. On the other hand, the reinforcement learning (RL) community seeks to develop learning algorithms that produce a policy with a low average error rate in future instances of a specific world, but with no guarantees for any instance since one instance could be an uncommon case that a "good" policy ignores.

RL algorithms require that the new instances be from the same world used for training, emphasizing the obtention of effective policies at the expense of specializing in a particular fixed world. In contrast, AI planning emphasizes flexibility, robustness, and adaptation to new instances and worlds at the expense of requiring a world description.

The state of the art in RL consists of algorithms that need applied research to be used in specific domains—sometimes with great success—such as AlphaGo, a superhuman player of the board game Go.

State-of-the-art algorithms in AI planning can obtain plans with thousands of actions for unseen world descriptions. For some problems, AI planning is particularly effective. For instance, the logistics domain is one of the many benchmarks used for evaluating planning techniques. The planning community relies on a standard language for describing the planning domain and problems: the Planning Domain Definition Language (PDDL). New domains are introduced during international planning competitions.

You can see and solve an instance of logistics using this example of online tool, where you can see other instances of logistics and import other domains.

PDDL Editor online tool from

The same logistics example can be explored using Visual Studio Code. Check out the repo PDDLGym if you want to test RL algorithms in planning domains, both deterministic and with probabilistic effects. Documentation and educational material about planning are available at and

Why AI planning and RL matter for enterprise AI

First, although RL methods are becoming state of the art in many AI use cases and applications, they do not scale as well as AI planning in terms of generalization over a family of situations in the same domain, changes in the domain, and guarantees per instance. The relationship between learning and reasoning is at the core of this mismatch in scalability and calls for tighter integrations to overcome the weaknesses of each family of methods.

Second, ServiceNow offers customers a unified platform for enterprise AI. A team of human agents using the Now Platform can execute multiple actions to achieve their goals—e.g., to resolve a new IT incident. In such a scenario, AI planning and RL can provide AI methods to recommend sequences of actions that resolve such incidents. In contrast, standard machine learning methods might be unaware of the consequences of intermediate actions.

Bridging the gap between AI planning and RL

The organizers of the second edition of the AI planning and reinforcement learning (PRL) website accepted 25 of the 35 papers submitted and hosted five invited talks, 11 oral presentations of papers, a poster session, and five discussion sessions about topics transversal to the accepted papers. The recordings, papers, and posters are available on the PRL website. We are organizing a new edition of the PRL workshop.

Invited talks:

Discussion session topics included:

  • Abstractions in Planning & RL

  • Safe, Risk-Sensitive, and Robust Planning and RL

  • Domain Generalization in Planning and RL

The papers detailed the following problems and techniques:

  • Problems considered:

    • How to improve RL generalization in the same domain/world
    • How to use planning models to improve RL
    • How to ensure safe RL and verify RL guarantees

  • Techniques used:

    • Planning and RL algorithms
    • World simulators, some of them using a hidden planning model as ground truth
    • Graph neural networks (GNNs)
    • Optimization under constraints
    • Hierarchical representations

Thank you to the co-organizers

We want to thank the co-organizers of the workshop: Hector Palacios from Element AI, a ServiceNow company; Vicenç Gómez and Anders Jonsson from Universitat Pompeu Fabra in Barcelona, Spain; Scott Sanner from the University of Toronto; Andrey Kolobov from Microsoft Research in Redmond, Washington; and Alan Fern from Oregon State University.

Related content

In addition to being the co-chair of the PRL workshop, Hector Palacios participated in the main track of ICAPS 2021 with an invited talk titled “Planning for Controlling Business-to-business Applications” (recording).

© 2021 ServiceNow, Inc. All rights reserved. ServiceNow, the ServiceNow logo, Now, and other ServiceNow marks are trademarks and/or registered trademarks of ServiceNow, Inc. in the United States and/or other countries. Other company names, product names, and logos may be trademarks of the respective companies with which they are associated.

ICAPS 2021 screenshot: Opportunities and Challenges


  • Total experience companies outperform: prism refraction with an arrow pointing to the right
    Employee Experience
    Survey says: Total experience-focused companies outperform
    Organizations are aligning employee experience and customer experience to create a positive total experience. See findings from the latest research.
  • Engaging employee experience: woman in a hijab smiling at a laptop
    Employee Experience
    4 steps to an engaging employee experience
    Helping workers fulfill their purpose can increase employee satisfaction and decrease turnover. Learn four steps to create an engaging employee experience.
  • Hybrid work environment: person sitting in front of a laptop on a video call
    Employee Experience
    4 steps to optimize a hybrid work environment
    Hybrid work combines the collaborative atmosphere offices provide with the perks of working remotely. Explore four ways to manage a hybrid work environment.

Trends & Research

  • Total experience companies outperform: prism refraction with an arrow pointing to the right
    Employee Experience
    Survey says: Total experience-focused companies outperform
  • Forrester Wave Leader 2022: Value Stream Management
    IT Management
    Forrester: ServiceNow is a Leader in value stream management solutions
  • Gartner Magic Quadrant Enterprise Low-Code Application Platforms
    Application Development
    A Magic Quadrant™ Leader in Low-Code Application Platforms for third year