ServiceNow Research

Optimization

DAG Learning via Sparse Relaxations

We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our …

Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Block coordinate descent (BCD) methods are widely used for large-scale numerical optimization because of their cheap iteration costs, …
Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence
We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although …
Learning Data Augmentation with Online Bilevel Optimization for Image Classification
Data augmentation is a key practice in machine learning for improving generalization performance. However, finding the best data …
AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation
Entropy is ubiquitous in machine learning, but it is in general intractable to compute the entropy of the distribution of an arbitrary …
Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation
We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition …
A Closer Look at the Optimization Landscapes of Generative Adversarial Networks
Generative adversarial networks have been very successful in generative modeling, however they remain relatively challenging to train …
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for …
Reducing Noise in GAN Training with Variance Reduced Extragradient
We study the effect of the stochastic gradient noise on the training of generative adversarial networks (GANs) and show that it can …
Efficient Deep Gaussian Process Models for Variable-Sized Inputs
Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variable-sized data, and learn deep features. Their …