ServiceNow Research

Optimization

Fast Convergence of Softmax Policy Mirror Ascent

We analyze the convergence of a novel policy gradient algorithm (referred to as SPMA) for multi-armed bandits and tabular Markov …

Bidding in day-ahead electricity markets: A dynamic programming framework
Strategic bidding problems have gained a lot of attention with the introduction of deregulated electricity markets where producers and …
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their …
Fast Convergence of Softmax Policy Mirror Ascent for Bandits & Tabular MDPs

We analyze the convergence of a novel policy gradient algorithm (referred to as SPMA) for multi-armed bandits and tabular Markov …

Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences
Direct Preference Optimization (DPO) is an effective technique that leverages pairwise preference data (usually one chosen and rejected …
Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones
Early Exiting (EE) is a promising technique for speeding up inference at the cost of limited performance loss. It adaptively allocates …
Surrogate Minimization: An Optimization Algorithm for Training Large Neural Networks with Model Parallelism
Optimizing large memory-intensive neural networks requires distributing its layers across multiple GPUs (referred to as model …
On Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants
We investigate the convergence of stochastic mirror descent (SMD) under interpolation in relatively smooth and smooth convex …
Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Block coordinate descent (BCD) methods are widely used for large-scale numerical optimization because of their cheap iteration costs, …
DAG Learning on the Permutahedron

We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our …