ServiceNow Research

Theory of Machine Learning

Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks
Both PAC-Bayesian and Sample Compress learning frameworks have been shown instrumental for deriving tight (non-vacuous) generalization …
Fast Convergence of Softmax Policy Mirror Ascent

We analyze the convergence of a novel policy gradient algorithm (referred to as SPMA) for multi-armed bandits and tabular Markov …

Sample compression unleashed: New generalization bounds for real valued losses
The sample compression theory provides generalization guarantees for predictors that can be fully defined using a subset of the …
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their …
Fast Convergence of Softmax Policy Mirror Ascent for Bandits & Tabular MDPs

We analyze the convergence of a novel policy gradient algorithm (referred to as SPMA) for multi-armed bandits and tabular Markov …

Sample Compression Hypernetworks: From Generalization Bounds to Meta-Learning
Reconstruction functions are pivotal in sample compression theory, a framework for deriving tight generalization bounds. From a small …
Sample compression unleashed: New generalization bounds for real valued losses
The sample compression theory provides generalization guarantees for predictors that can be fully defined using a subset of the …
Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones
Early Exiting (EE) is a promising technique for speeding up inference at the cost of limited performance loss. It adaptively allocates …
Generalization bounds with arbitrary complexity measures
In statistical learning theory, a generalization bound usually involves a complexity measure imposed by the considered theoretical …
Surrogate Minimization: An Optimization Algorithm for Training Large Neural Networks with Model Parallelism
Optimizing large memory-intensive neural networks requires distributing its layers across multiple GPUs (referred to as model …