ServiceNow AI Research

Reinforcement Learning

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization
Combinatorial optimization has found applications in numerous fields, from aerospace to transportation planning and economics. The goal …
Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning
In multi-agent reinforcement learning, discovering successful collective behaviors is challenging as it requires exploring a joint …
Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents
As deep reinforcement learning driven by visual perception becomes more widely used there is a growing need to better understand and …
Reinforced Active Learning for Image Segmentation
Learning-based approaches for semantic segmentation have two inherent challenges. First, acquiring pixel-wise labels is expensive and …
Learning Reward Machines for Partially Observable Reinforcement Learning
Reward Machines (RMs) provide a structured, automata-based representation of a reward function that enables a Reinforcement Learning …
Real-Time Reinforcement Learning
Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used …
Searching for Markovian Subproblems to Address Partially Observable Reinforcement Learning
In partially observable environments, an agent’s policy should often be a function of the history of its interaction with the …
BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning
Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and …
Improving Optimization Bounds using Machine Learning: Decision Diagrams meet Deep Reinforcement Learning
Finding tight bounds on the optimal solution is a critical element of practical solution methods for discrete optimization problems. In …
Reinforced Imitation in Heterogeneous Action Space
Imitation learning is an effective alternative approach to learn a policy when the reward function is sparse. In this paper, we …