ServiceNow AI Research

Theory of Machine Learning

Surrogate Minimization: An Optimization Algorithm for Training Large Neural Networks with Model Parallelism
Optimizing large memory-intensive neural networks requires distributing its layers across multiple GPUs (referred to as model …
On Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants
We investigate the convergence of stochastic mirror descent (SMD) under interpolation in relatively smooth and smooth convex …
Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Block coordinate descent (BCD) methods are widely used for large-scale numerical optimization because of their cheap iteration costs, …
DAG Learning on the Permutahedron

We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our …

Deep Hyperbolic Reinforcement Learning for Continuous Control
Integrating hyperbolic representations with Deep Reinforcement Learning (DRL) has recently been proposed as a promising approach for …