Adam M. Oberman

h-index26

40papers

2,669citations

Novelty45%

AI Score35

Ranked #103,891 of 194,257 authors (top 53%)#666 in NA (top 27%)

40 Papers

20.5LGMar 1, 2022

On the Generalization of Representations in Reinforcement Learning

Charline Le Lan, Stephen Tu, Adam Oberman et al. · deepmind

In reinforcement learning, state representations are used to tractably deal with large problem spaces. State representations serve both to approximate the value function with few parameters, but also to generalize to newly encountered states. Their features may be learned implicitly (as part of a neural network) or explicitly (for example, the successor representation of \citet{dayan1993improving}). While the approximation properties of representations are reasonably well-understood, a precise characterization of how and when these representations generalize is lacking. In this work, we address this gap and provide an informative bound on the generalization error arising from a specific state representation. This bound is based on the notion of effective dimension which measures the degree to which knowing the value at one state informs the value at other states. Our bound applies to any state representation and quantifies the natural tension between representations that generalize well and those that approximate well. We complement our theoretical results with an empirical survey of classic representation learning methods from the literature and results on the Arcade Learning Environment, and find that the generalization behaviour of learned representations is well-explained by their effective dimension.

2.3NAAug 23, 2012

Numerical solution of the Optimal Transportation problem using the Monge-Ampere equation

Jean-David Benamou, Brittany D. Froese, Adam M. Oberman

A numerical method for the solution of the elliptic Monge-Ampere Partial Differential Equation, with boundary conditions corresponding to the Optimal Transportation (OT) problem is presented. A local representation of the OT boundary conditions is combined with a finite difference scheme for the Monge-Ampere equation. Newton's method is implemented leading to a fast solver, comparable to solving the Laplace equation on the same grid several times. Theoretical justification for the method is given by a convergence proof in the companion paper (Benamou et al., 2012). In this paper, the algorithm is modified to a simpler compact stencil implementation and details of the implementation are given. Solutions are computed with densities supported on non-convex and disconnected domains. Computational examples demonstrate robust performance on singular solutions and fast computational times.

16.1NAJun 3, 2011

Convergent finite difference solvers for viscosity solutions of the elliptic Monge-Ampère equation in dimensions two and higher

Brittany D. Froese, Adam M. Oberman

The elliptic Monge-Ampère equation is a fully nonlinear Partial Differential Equation that originated in geometric surface theory and has been applied in dynamic meteorology, elasticity, geometric optics, image processing and image registration. Solutions can be singular, in which case standard numerical approaches fail. Novel solution methods are required for stability and convergence to the weak (viscosity) solution. In this article we build a wide stencil finite difference discretization for the \MA equation. The scheme is monotone, so the Barles-Souganidis theory allows us to prove that the solution of the scheme converges to the unique viscosity solution of the equation. Solutions of the scheme are found using a damped Newton's method. We prove convergence of Newton's method and provide a systematic method to determine a starting point for the Newton iteration. Computational results are presented in two and three dimensions, which demonstrates the speed and accuracy of the method on a number of exact solutions, which range in regularity from smooth to non-differentiable.

13.4NADec 3, 2012

Convergent filtered schemes for the Monge-Ampère partial differential equation

Brittany D. Froese, Adam M. Oberman

The theory of viscosity solutions has been effective for representing and approximating weak solutions to fully nonlinear Partial Differential Equations (PDEs) such as the elliptic Monge-Ampère equation. The approximation theory of Barles-Souganidis [Barles and Souganidis, Asymptotic Anal., 4 (1999) 271-283] requires that numerical schemes be monotone (or elliptic in the sense of [Oberman, SIAM J. Numer. Anal, 44 (2006) 879-895]. But such schemes have limited accuracy. In this article, we establish a convergence result for nearly monotone schemes. This allows us to construct finite difference discretizations of arbitrarily high-order. We demonstrate that the higher accuracy is achieved when solutions are sufficiently smooth. In addition, the filtered scheme provides a natural detection principle for singularities. We employ this framework to construct a formally second-order scheme for the Monge-Ampère equation and present computational results on smooth and singular solutions.

4.3NADec 5, 2012

Finite difference methods for the Infinity Laplace and p-Laplace equations

Adam M. Oberman

We build convergent discretizations and semi-implicit solvers for the Infinity Laplacian and the game theoretical $p$-Laplacian. The discretizations simplify and generalize earlier ones. We prove convergence of the solution of the Wide Stencil finite difference schemes to the unique viscosity solution of the underlying equation. We build a semi-implicit solver, which solves the Laplace equation as each step. It is fast in the sense that the number of iterations is independent of the problem size. This is an improvement over previous explicit solvers, which are slow due to the CFL-condition.

1.2NANov 1, 2016

Finite difference methods for fractional Laplacians

Yanghong Huang, Adam Oberman

The fractional Laplacian $(-Δ)^{α/2}$ is the prototypical non-local elliptic operator. While analytical theory has been advanced and understood for some time, there remain many open problems in the numerical analysis of the operator. In this article, we study several different finite difference discretisations of the fractional Laplacian on uniform grids in one dimension that takes the same form. Many properties can be compared and summarised in this relatively simple setting, to tackle more important questions like the nonlocality, singularity and flat tails common in practical implementations. The accuracy and the asymptotic behaviours of the methods are also studied, together with treatment of the far field boundary conditions, providing a unified perspective on the further development of the scheme in higher dimensions.

15.6LGOct 21, 2022

Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models

Vikram Voleti, Christopher Pal, Adam Oberman

Generative models based on denoising diffusion techniques have led to an unprecedented increase in the quality and diversity of imagery that is now possible to create with neural generative models. However, most contemporary state-of-the-art methods are derived from a standard isotropic Gaussian formulation. In this work we examine the situation where non-isotropic Gaussian distributions are used. We present the key mathematical derivations for creating denoising diffusion models using an underlying non-isotropic Gaussian noise model. We also provide initial experiments with the CIFAR-10 dataset to help verify empirically that this more general modeling approach can also yield high-quality samples.

2.3NAAug 2, 2013

A viscosity solution approach to the Monge-Ampere formulation of the Optimal Transportation Problem

Jean-David Benamou, Brittany D. Froese, Adam M. Oberman

In this work we present a numerical method for the Optimal Mass Transportation problem. Optimal Mass Transportation (OT) is an active research field in mathematics.It has recently led to significant theoretical results as well as applications in diverse areas. Numerical solution techniques for the OT problem remain underdeveloped. The solution is obtained by solving the second boundary value problem for the MA equation, a fully nonlinear elliptic partial differential equation (PDE). Instead of standard boundary conditions the problem has global state constraints. These are reformulated as a tractable local PDE. We give a proof of convergence of the numerical method, using the theory of viscosity solutions. Details of the implementation and a fast solution method are provided in the companion paper arXiv:1208.4870.

4.3NAAug 23, 2012

A numerical method for variational problems with convexity constraints

Adam M. Oberman

We consider the problem of approximating the solution of variational problems subject to the constraint that the admissible functions must be convex. This problem is at the interface between convex analysis, convex optimization, variational problems, and partial differential equation techniques. The approach is to approximate the (non-polyhedral) cone of convex functions by a polyhedral cone which can be represented by linear inequalities. This approach leads to an optimization problem with linear constraints which can be computed efficiently, hundreds of times faster than existing methods.

1.2NANov 18, 2015

Adaptive finite difference methods for nonlinear elliptic and parabolic partial differential equations with free boundaries

Adam M. Oberman, Ian Zwiers

Monotone finite difference methods provide stable convergent discretizations of a class of degenerate elliptic and parabolic Partial Differential Equations (PDEs). These methods are best suited to regular rectangular grids, which leads to low accuracy near curved boundaries or singularities of solutions. In this article we combine monotone finite difference methods with an adaptive grid refinement technique to produce a PDE discretization and solver which is applied to a broad class of equations, in curved or unbounded domains which include free boundaries. The grid refinement is flexible and adaptive. The discretization is combined with a fast solution method, which incorporates asynchronous time stepping adapted to the spatial scale. The framework is validated on linear problems in curved and unbounded domains. Key applications include the obstacle problem and the one-phase Stefan free boundary problem.

3.7CVOct 3, 2022Code

A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods

Tiago Salvador, Kilian Fatras, Ioannis Mitliagkas et al.

Unsupervised Domain Adaptation (UDA) aims at classifying unlabeled target images leveraging source labeled ones. In this work, we consider the Partial Domain Adaptation (PDA) variant, where we have extra source classes not present in the target domain. Most successful algorithms use model selection strategies that rely on target labels to find the best hyper-parameters and/or models along training. However, these strategies violate the main assumption in PDA: only unlabeled target domain samples are available. Moreover, there are also inconsistencies in the experimental settings - architecture, hyper-parameter tuning, number of runs - yielding unfair comparisons. The main goal of this work is to provide a realistic evaluation of PDA methods with the different model selection strategies under a consistent evaluation protocol. We evaluate 7 representative PDA algorithms on 2 different real-world datasets using 7 different model selection strategies. Our two main findings are: (i) without target labels for model selection, the accuracy of the methods decreases up to 30 percentage points; (ii) only one method and model selection pair performs well on both datasets. Experiments were performed with our PyTorch framework, BenchmarkPDA, which we open source.

2.3NAFeb 10, 2016

Numerical Methods for the 2-Hessian Elliptic Partial Differential Equation

Brittany D. Froese, Adam M. Oberman, Tiago Salvador

The elliptic 2-Hessian equation is a fully nonlinear partial differential equation (PDE) that is related to intrinsic curvature for three dimensional manifolds. We introduce two numerical methods for this PDE: the first is provably convergent to the viscosity solution, and the second is more accurate, and convergent in practice but lacks a proof. The PDE is elliptic on a restricted set of functions: a convexity type constraint is needed for the ellipticity of the PDE operator. Solutions with both discretizations are obtained using Newton's method. Computational results are presented on a number of exact solutions which range in regularity from smooth to nondifferentiable and in shape from convex to non convex.

1.2NAJul 13, 2018

Improved accuracy of monotone finite difference schemes on point clouds and regular grids

Chris Finlay, Adam Oberman

Finite difference schemes are the method of choice for solving nonlinear, degenerate elliptic PDEs, because the Barles-Sougandis convergence framework [Barles and Sougandidis, Asymptotic Analysis, 4(3):271-283, 1991] provides sufficient conditions for convergence to the unique viscosity solution [Crandall, Ishii and Lions, Bull. Amer. Math Soc., 27(1):1-67, 1992]. For anisotropic operators, such as the Monge-Ampere equation, wide stencil schemes are needed [Oberman, SIAM J. Numer. Anal., 44(2):879-895]. The accuracy of these schemes depends on both the distances to neighbors, $R$, and the angular resolution, $dθ$. On uniform grids, the accuracy is $\mathcal O(R^2 + dθ)$. On point clouds, the most accurate schemes are of $\mathcal O(R + dθ)$, by Froese [Numerische Mathematik, 138(1):75-99, 2018]. In this work, we construct geometrically motivated schemes of higher accuracy in both cases: order $\mathcal O(R + dθ^2)$ on point clouds, and $\mathcal O(R^2 + dθ^2)$ on uniform grids.

3.3LGDec 22, 2022

EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models

Xinlin Li, Mariana Parazeres, Adam Oberman et al.

With the advent of deep learning application on edge devices, researchers actively try to optimize their deployments on low-power and restricted memory devices. There are established compression method such as quantization, pruning, and architecture search that leverage commodity hardware. Apart from conventional compression algorithms, one may redesign the operations of deep learning models that lead to more efficient implementation. To this end, we propose EuclidNet, a compression method, designed to be implemented on hardware which replaces multiplication, $xw$, with Euclidean distance $(x-w)^2$. We show that EuclidNet is aligned with matrix multiplication and it can be used as a measure of similarity in case of convolutional layers. Furthermore, we show that under various transformations and noise scenarios, EuclidNet exhibits the same performance compared to the deep learning models designed with multiplication operations.

1.2NADec 16, 2016

Computing the quasiconvex envelope using a nonlocal line solver

Bilal Abbasi, Adam M. Oberman

Recently in a series of articles, Barron, Goebel, and Jensen \cite{barron2012functions} \cite{barron2012quasiconvex} \cite{barron2013quasiconvex} \cite{barron2013uniqueness} have studied second order degenerate elliptic PDE and first order nonlocal PDEs for the quasiconvex envelope. Quasiconvex functions are functions whose level sets are convex. The PDE is difficult to solve. In this article we present an algorithm for computing the quasiconvex envelope (QCE) of a given function. The QCE operator is a level set operator, so this algorithm gives a method to compute convex hull of sets represented by a level set functions. We present a nonlocal line solver for the quasiconvex envelope (QCE), based on solving the one dimensional problem on lines. We find an explicit formula for the QCE of a function defined on a line.

1.2NAOct 28, 2016

Numerical methods for motion of level sets by affine curvature

Adam M. Oberman, Tiago Salvador

We study numerical methods for the nonlinear partial differential equation that governs the motion of level sets by affine curvature. We show that standard finite difference schemes are nonlinearly unstable. We build convergent finite difference schemes, using the theory of viscosity solutions. We demonstrate that our approximate solutions capture the affine invariance and morphological properties of the evolution. Numerical experiments demonstrate the accuracy and stability of the discretization.

8.3LGJul 8

Fast Rates for Semi-Supervised Learning via Data-Augmentation Graph Regularization

Adam M. Oberman

Self-supervised learning matches supervised accuracy from a fraction of the labels, but the labeled-sample efficiency behind this has lacked a theoretical explanation. We provide one. Data augmentation induces a similarity graph on the unlabeled data, so downstream learning on that graph is graph-Laplacian-regularized learning. We prove a fast transductive rate, $O(1/n_L)$ in the number of labels, in place of the supervised $O(1/\sqrt{n_L})$, by carrying the leave-one-out stability apparatus of Johnson and Zhang (JMLR 2007) over to the augmentation graph, and without the unrealistic assumptions of limit-based analyses (exact kernel, generalizing features). The bound makes augmentation quality explicit: the expected error is at most $C/n_L + R_{\mathrm{DA}}(y)$, where the data-augmentation alignment error $R_{\mathrm{DA}}(y)$ is the graph-cut mass of augmentations that cross a label boundary, so good augmentations let few labels suffice. The analysis uses a streamlined loss that drops the projector, negative-sample, and orthogonality overhead of standard objectives yet still recovers the top-$K$ ideal features in the infinite-data limit, the augmentation-kernel eigenspace studied by Zhai et al. The result explains the observed accuracy-versus-label-count curve rather than only bounding a generalization gap.

7.0LGJul 8

Avoiding unsafe sets when training with Langevin Dynamics

Adam M. Oberman

Training a model with noisy gradient descent can be idealized as overdamped Langevin dynamics on the loss landscape, and a natural safety question is to bound the probability $ν_t(\mathcal{A}_H) = \mathbb{P}(Q_t \in \mathcal{A}_H)$ that the trajectory lies in a designated failure region $\mathcal{A}_H$. We study this for a smooth, strongly convex loss in $d$ dimensions and a failure region separated from the minimizer by an energy gap. Three bounds emerge. At the end of training, the equilibrium mass $π(\mathcal{A}_H)$ is exponentially small in $d$, with a complementary energy-barrier rate when the noise is small. Along the trajectory, a shape-free bound $ν_t(\mathcal{A}_H) \le π(\mathcal{A}_H)(1 + \sqrt{χ_0^2/π(\mathcal{A}_H)}\,e^{-mt})$ shows that the in-set probability relaxes to (twice) the static value after a burn-in time of order $d$, using only the global spectral gap $m$ of the loss. A worked Ornstein-Uhlenbeck example shows this burn-in is necessary: an angular slice of the equilibrium shell can transiently swell by a factor exponential in $d$, even though its equilibrium mass is tiny. To rule such swelling out we introduce a local relaxation rate attached to the failure region, defined through the spectral measure of its centered indicator rather than a Dirichlet-form Rayleigh quotient. For geometrically isolated regions this rate exceeds the global one, shrinking the burn-in proportionally, and combined with a maximum-principle ceiling it caps the trajectory probability uniformly in time. The picture is that strong convexity sets how fast training relaxes, but the shape of the unsafe set decides whether the trajectory bulges through it on the way home.

34.0AIFeb 21, 2025

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Yoshua Bengio, Michael Cohen, Damiano Fornasiere et al.

The leading AI companies are increasingly focused on building generalist AI agents -- systems that can autonomously plan, act, and pursue goals across almost all tasks that humans can perform. Despite how useful these systems might be, unchecked AI agency poses significant risks to public safety and security, ranging from misuse by malicious actors to a potentially irreversible loss of human control. We discuss how these risks arise from current AI training methods. Indeed, various scenarios and experiments have demonstrated the possibility of AI agents engaging in deception or pursuing goals that were not specified by human operators and that conflict with human interests, such as self-preservation. Following the precautionary principle, we see a strong need for safer, yet still useful, alternatives to the current agency-driven trajectory. Accordingly, we propose as a core building block for further advances the development of a non-agentic AI system that is trustworthy and safe by design, which we call Scientist AI. This system is designed to explain the world from observations, as opposed to taking actions in it to imitate or please humans. It comprises a world model that generates theories to explain data and a question-answering inference machine. Both components operate with an explicit notion of uncertainty to mitigate the risks of overconfident predictions. In light of these considerations, a Scientist AI could be used to assist human researchers in accelerating scientific progress, including in AI safety. In particular, our system can be employed as a guardrail against AI agents that might be created despite the risks involved. Ultimately, focusing on non-agentic AI may enable the benefits of AI innovation while avoiding the risks associated with the current trajectory. We hope these arguments will motivate researchers, developers, and policymakers to favor this safer path.

6.6LGDec 17, 2023Code

Harnessing small projectors and multiple views for efficient vision pretraining

Kumar Krishna Agrawal, Arna Ghosh, Shagun Sodhani et al.

Recent progress in self-supervised (SSL) visual representation learning has led to the development of several different proposed frameworks that rely on augmentations of images but use different loss functions. However, there are few theoretically grounded principles to guide practice, so practical implementation of each SSL framework requires several heuristics to achieve competitive performance. In this work, we build on recent analytical results to design practical recommendations for competitive and efficient SSL that are grounded in theory. Specifically, recent theory tells us that existing SSL frameworks are minimizing the same idealized loss, which is to learn features that best match the data similarity kernel defined by the augmentations used. We show how this idealized loss can be reformulated to a functionally equivalent loss that is more efficient to compute. We study the implicit bias of using gradient descent to minimize our reformulated loss function and find that using a stronger orthogonalization constraint with a reduced projector dimensionality should yield good representations. Furthermore, the theory tells us that approximating the reformulated loss should be improved by increasing the number of augmentations, and as such using multiple augmentations should lead to improved convergence. We empirically verify our findings on CIFAR, STL and Imagenet datasets, wherein we demonstrate an improved linear readout performance when training a ResNet-backbone using our theoretically grounded recommendations. Remarkably, we also demonstrate that by leveraging these insights, we can reduce the pretraining dataset size by up to 2$\times$ while maintaining downstream accuracy simply by using more data augmentations. Taken together, our work provides theoretically grounded recommendations that can be used to improve SSL convergence and efficiency.

4.1LGMay 17, 2025

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Mehran Shakerinava, Siamak Ravanbakhsh, Adam Oberman

Recent work has formalized the reward hypothesis through the lens of expected utility theory, by interpreting reward as utility. Hausner's foundational work showed that dropping the continuity axiom leads to a generalization of expected utility theory where utilities are lexicographically ordered vectors of arbitrary dimension. In this paper, we extend this result by identifying a simple and practical condition under which preferences cannot be represented by scalar rewards, necessitating a 2-dimensional reward function. We provide a full characterization of such reward functions, as well as the general d-dimensional case, in Markov Decision Processes (MDPs) under a memorylessness assumption on preferences. Furthermore, we show that optimal policies in this setting retain many desirable properties of their scalar-reward counterparts, while in the Constrained MDP (CMDP) setting -- another common multiobjective setting -- they do not.

2.6CVJun 15, 2021Code

Multi-Resolution Continuous Normalizing Flows

Vikram Voleti, Chris Finlay, Adam Oberman et al.

Recent work has shown that Neural Ordinary Differential Equations (ODEs) can serve as generative models of images using the perspective of Continuous Normalizing Flows (CNFs). Such models offer exact likelihood calculation, and invertible generation/density estimation. In this work we introduce a Multi-Resolution variant of such models (MRCNF), by characterizing the conditional distribution over the additional information required to generate a fine image that is consistent with the coarse image. We introduce a transformation between resolutions that allows for no change in the log likelihood. We show that this approach yields comparable likelihood values for various image datasets, with improved performance at higher resolutions, with fewer parameters, using only 1 GPU. Further, we examine the out-of-distribution properties of (Multi-Resolution) Continuous Normalizing Flows, and find that they are similar to those of other likelihood-based generative models.

3.6MLJun 7, 2021

Frustratingly Easy Uncertainty Estimation for Distribution Shift

Tiago Salvador, Vikram Voleti, Alexander Iannantuono et al.

Distribution shift is an important concern in deep image classification, produced either by corruption of the source images, or a complete change, with the solution involving domain adaptation. While the primary goal is to improve accuracy under distribution shift, an important secondary goal is uncertainty estimation: evaluating the probability that the prediction of a model is correct. While improving accuracy is hard, uncertainty estimation turns out to be frustratingly easy. Prior works have appended uncertainty estimation into the model and training paradigm in various ways. Instead, we show that we can estimate uncertainty by simply exposing the original model to corrupted images, and performing simple statistical calibration on the image outputs. Our frustratingly easy methods demonstrate superior performance on a wide range of distribution shifts as well as on unsupervised domain adaptation tasks, measured through extensive experimentation.

11.1CVJun 7, 2021

FairCal: Fairness Calibration for Face Verification

Tiago Salvador, Stephanie Cairns, Vikram Voleti et al.

Despite being widely used, face recognition models suffer from bias: the probability of a false positive (incorrect face match) strongly depends on sensitive attributes such as the ethnicity of the face. As a result, these models can disproportionately and negatively impact minority groups, particularly when used by law enforcement. The majority of bias reduction methods have several drawbacks: they use an end-to-end retraining approach, may not be feasible due to privacy issues, and often reduce accuracy. An alternative approach is post-processing methods that build fairer decision classifiers using the features of pre-trained models, thus avoiding the cost of retraining. However, they still have drawbacks: they reduce accuracy (AGENDA, PASS, FTC), or require retuning for different false positive rates (FSN). In this work, we introduce the Fairness Calibration (FairCal) method, a post-training approach that simultaneously: (i) increases model accuracy (improving the state-of-the-art), (ii) produces fairly-calibrated probabilities, (iii) significantly reduces the gap in the false positive rates, (iv) does not require knowledge of the sensitive attribute, and (v) does not require retraining, training an additional model, or retuning. We apply it to the task of Face Verification, and obtain state-of-the-art results with all the above advantages.

1.2LGOct 5, 2020Code

Adversarial Boot Camp: label free certified robustness in one epoch

Ryan Campbell, Chris Finlay, Adam M Oberman

Machine learning models are vulnerable to adversarial attacks. One approach to addressing this vulnerability is certification, which focuses on models that are guaranteed to be robust for a given perturbation size. A drawback of recent certified models is that they are stochastic: they require multiple computationally expensive model evaluations with random noise added to a given input. In our work, we present a deterministic certification approach which results in a certifiably robust model. This approach is based on an equivalence between training with a particular regularized loss, and the expected values of Gaussian averages. We achieve certified models on ImageNet-1k by retraining a model with this loss for one epoch without the use of label information.

2.3LGJun 10, 2020Code

Deterministic Gaussian Averaged Neural Networks

Ryan Campbell, Chris Finlay, Adam M Oberman

We present a deterministic method to compute the Gaussian average of neural networks used in regression and classification. Our method is based on an equivalence between training with a particular regularized loss, and the expected values of Gaussian averages. We use this equivalence to certify models which perform well on clean data but are not robust to adversarial perturbations. In terms of certified accuracy and adversarial robustness, our method is comparable to known stochastic methods such as randomized smoothing, but requires only a single model evaluation during inference.

14.0LGJun 10, 2020

Learning normalizing flows from Entropy-Kantorovich potentials

Chris Finlay, Augusto Gerolin, Adam M Oberman et al.

We approach the problem of learning continuous normalizing flows from a dual perspective motivated by entropy-regularized optimal transport, in which continuous normalizing flows are cast as gradients of scalar potential functions. This formulation allows us to train a dual objective comprised only of the scalar potential functions, and removes the burden of explicitly computing normalizing flows during training. After training, the normalizing flow is easily recovered from the potential functions.

39.4MLFeb 7, 2020

How to train your neural ODE: the world of Jacobian and kinetic regularization

Chris Finlay, Jörn-Henrik Jacobsen, Levon Nurbekyan et al.

Training neural ODEs on large datasets has not been tractable due to the necessity of allowing the adaptive numerical ODE solver to refine its step size to very small values. In practice this leads to dynamics equivalent to many hundreds or even thousands of layers. In this paper, we overcome this apparent difficulty by introducing a theoretically-grounded combination of both optimal transport and stability regularizations which encourage neural ODEs to prefer simpler dynamics out of all the dynamics that solve a problem well. Simpler dynamics lead to faster convergence and to fewer discretizations of the solver, considerably decreasing wall-clock time without loss in performance. Our approach allows us to train neural ODE-based generative models to the same performance as the unregularized dynamics, with significant reductions in training time. This brings neural ODEs closer to practical relevance in large-scale applications.

1.8LGOct 4, 2019

Farkas layers: don't shift the data, fix the geometry

Aram-Alexandre Pooladian, Chris Finlay, Adam M Oberman

Successfully training deep neural networks often requires either batch normalization, appropriate weight initialization, both of which come with their own challenges. We propose an alternative, geometrically motivated method for training. Using elementary results from linear programming, we introduce Farkas layers: a method that ensures at least one neuron is active at a given layer. Focusing on residual networks with ReLU activation, we empirically demonstrate a significant improvement in training capacity in the absence of batch normalization or methods of initialization across a broad range of network sizes on benchmark datasets.

4.1LGOct 3, 2019

Partial differential equation regularization for supervised machine learning

Adam M Oberman

This article is an overview of supervised machine learning problems for regression and classification. Topics include: kernel methods, training by stochastic gradient descent, deep learning architecture, losses for classification, statistical learning theory, and dimension independent generalization bounds. Implicit regularization in deep learning examples are presented, including data augmentation, adversarial training, and additive noise. These methods are reframed as explicit gradient regularization.

6.6LGAug 5, 2019Code

A principled approach for generating adversarial images under non-smooth dissimilarity metrics

Aram-Alexandre Pooladian, Chris Finlay, Tim Hoheisel et al.

Deep neural networks perform well on real world data but are prone to adversarial perturbations: small changes in the input easily lead to misclassification. In this work, we propose an attack methodology not only for cases where the perturbations are measured by $\ell_p$ norms, but in fact any adversarial dissimilarity metric with a closed proximal form. This includes, but is not limited to, $\ell_1, \ell_2$, and $\ell_\infty$ perturbations; the $\ell_0$ counting "norm" (i.e. true sparseness); and the total variation seminorm, which is a (non-$\ell_p$) convolutional dissimilarity measuring local pixel changes. Our approach is a natural extension of a recent adversarial attack method, and eliminates the differentiability requirement of the metric. We demonstrate our algorithm, ProxLogBarrier, on the MNIST, CIFAR10, and ImageNet-1k datasets. We consider undefended and defended models, and show that our algorithm easily transfers to various datasets. We observe that ProxLogBarrier outperforms a host of modern adversarial attacks specialized for the $\ell_0$ case. Moreover, by altering images in the total variation seminorm, we shed light on a new class of perturbations that exploit neighboring pixel information.

22.6MLMay 27, 2019Code

Scaleable input gradient regularization for adversarial robustness

Chris Finlay, Adam M Oberman

In this work we revisit gradient regularization for adversarial robustness with some new ingredients. First, we derive new per-image theoretical robustness bounds based on local gradient information. These bounds strongly motivate input gradient regularization. Second, we implement a scaleable version of input gradient regularization which avoids double backpropagation: adversarially robust ImageNet models are trained in 33 hours on four consumer grade GPUs. Finally, we show experimentally and through theoretical certification that input gradient regularization is competitive with adversarial training. Moreover we demonstrate that gradient regularization does not lead to gradient obfuscation or gradient masking.

4.9MLMar 21, 2019

Calibrated Top-1 Uncertainty estimates for classification by score based models

Adam M. Oberman, Chris Finlay, Alexander Iannantuono et al.

While the accuracy of modern deep learning models has significantly improved in recent years, the ability of these models to generate uncertainty estimates has not progressed to the same degree. Uncertainty methods are designed to provide an estimate of class probabilities when predicting class assignment. While there are a number of proposed methods for estimating uncertainty, they all suffer from a lack of calibration: predicted probabilities can be off from empirical ones by a few percent or more. By restricting the scope of our predictions to only the probability of Top-1 error, we can decrease the calibration error of existing methods to less than one percent. As a result, the scores of the methods also improve significantly over benchmarks.

18.1LGOct 1, 2018

Improved robustness to adversarial examples using Lipschitz regularization of the loss

Chris Finlay, Adam Oberman, Bilal Abbasi

We augment adversarial training (AT) with worst case adversarial training (WCAT) which improves adversarial robustness by 11% over the current state-of-the-art result in the $\ell_2$ norm on CIFAR-10. We obtain verifiable average case and worst case robustness guarantees, based on the expected and maximum values of the norm of the gradient of the loss. We interpret adversarial training as Total Variation Regularization, which is a fundamental tool in mathematical image processing, and WCAT as Lipschitz regularization.

21.9LGAug 28, 2018

Lipschitz regularized Deep Neural Networks generalize and are adversarially robust

Chris Finlay, Jeff Calder, Bilal Abbasi et al.

In this work we study input gradient regularization of deep neural networks, and demonstrate that such regularization leads to generalization proofs and improved adversarial robustness. The proof of generalization does not overcome the curse of dimensionality, but it is independent of the number of layers in the networks. The adversarial robustness regularization combines adversarial training, which we show to be equivalent to Total Variation regularization, with Lipschitz regularization. We demonstrate empirically that the regularized models are more robust, and that gradient norms of images can be used for attack detection.

6.5OCOct 21, 2017

Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for $k$-means Clustering

Penghang Yin, Minh Pham, Adam Oberman et al.

In this paper, we propose an implicit gradient descent algorithm for the classic $k$-means problem. The implicit gradient step or backward Euler is solved via stochastic fixed-point iteration, in which we randomly sample a mini-batch gradient in every iteration. It is the average of the fixed-point trajectory that is carried over to the next gradient step. We draw connections between the proposed stochastic backward Euler and the recent entropy stochastic gradient descent (Entropy-SGD) for improving the training of deep neural networks. Numerical experiments on various synthetic and real datasets show that the proposed algorithm provides better clustering results compared to $k$-means algorithms in the sense that it decreased the objective function (the cluster) and is much more robust to initialization.

17.9LGApr 17, 2017

Deep Relaxation: partial differential equations for optimizing deep neural networks

Pratik Chaudhari, Adam Oberman, Stanley Osher et al.

In this paper we establish a connection between non-convex optimization methods for training deep neural networks and nonlinear partial differential equations (PDEs). Relaxation techniques arising in statistical physics which have already been used successfully in this context are reinterpreted as solutions of a viscous Hamilton-Jacobi PDE. Using a stochastic control interpretation allows we prove that the modified algorithm performs better in expectation that stochastic gradient descent. Well-known PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. The PDE is derived from a stochastic homogenization problem, which arises in the implementation of the algorithm. The algorithms scale well in practice and can effectively tackle the high dimensionality of modern neural networks.

1.2NASep 11, 2015

An efficient linear programming method for Optimal Transportation

Adam M. Oberman, Yuanlong Ruan

An efficient method for computing solutions to the Optimal Transportation (OT) problem with a wide class of cost functions is presented. The standard linear programming (LP) discretization of the continuous problem becomes intractible for moderate grid sizes. A grid refinement method results in a linear cost algorithm. Weak convergence of solutions is stablished. Barycentric projection of transference plans is used to improve the accuracy of solutions. The method is applied to more general problems, including partial optimal transportation, and barycenter problems. Computational examples validate the accuracy and efficiency of the method. Optimal maps between nonconvex domains, partial OT free boundaries, and high accuracy barycenters are presented.

1.2NANov 13, 2014

Numerical Methods for the Fractional Laplacian: a Finite Difference-quadrature Approach

Yanghong Huang, Adam Oberman

The fractional Laplacian $(-Δ)^{α/2}$ is a non-local operator which depends on the parameter $α$ and recovers the usual Laplacian as $α\to 2$. A numerical method for the fractional Laplacian is proposed, based on the singular integral representation for the operator. The method combines finite difference with numerical quadrature, to obtain a discrete convolution operator with positive weights. The accuracy of the method is shown to be $O(h^{3-α})$. Convergence of the method is proven. The treatment of far field boundary conditions using an asymptotic approximation to the integral is used to obtain an accurate method. Numerical experiments on known exact solutions validate the predicted convergence rates. Computational examples include exponentially and algebraically decaying solution with varying regularity. The generalization to nonlinear equations involving the operator is discussed: the obstacle problem for the fractional Laplacian is computed.

1.2NANov 13, 2014

Numerical methods for matching for teams and Wasserstein barycenters

Guillaume Carlier, Adam Oberman, Edouard Oudet

Equilibrium multi-population matching (matching for teams) is a problem from mathematical economics which is related to multi-marginal optimal transport. A special but important case is the Wasserstein barycenter problem, which has applications in image processing and statistics. Two algorithms are presented: a linear programming algorithm and an efficient nonsmooth optimization algorithm, which applies in the case of the Wasserstein barycenters. The measures are approximated by discrete measures: convergence of the approximation is proved. Numerical results are presented which illustrate the efficiency of the algorithms.