Howard Heaton

LG
8papers
582citations
Novelty49%
AI Score30

8 Papers

LGJan 31, 2023Code
Differentiating Through Integer Linear Programs with Quadratic Regularization and Davis-Yin Splitting

Daniel McKenzie, Samy Wu Fung, Howard Heaton

In many applications, a combinatorial problem must be repeatedly solved with similar, but distinct parameters. Yet, the parameters $w$ are not directly observed; only contextual data $d$ that correlates with $w$ is available. It is tempting to use a neural network to predict $w$ given $d$. However, training such a model requires reconciling the discrete nature of combinatorial optimization with the gradient-based frameworks used to train neural networks. We study the case where the problem in question is an Integer Linear Program (ILP). We propose applying a three-operator splitting technique, also known as Davis-Yin splitting (DYS), to the quadratically regularized continuous relaxation of the ILP. We prove that the resulting scheme is compatible with the recently introduced Jacobian-free backpropagation (JFB). Our experiments on two representative ILPs: the shortest path problem and the knapsack problem, demonstrate that this combination-DYS on the forward pass, JFB on the backward pass-yields a scheme which scales more effectively to high-dimensional problems than existing schemes. All code associated with this paper is available at github.com/mines-opt-ml/fpo-dys.

OCApr 29, 2022
Explainable AI via Learning to Optimize

Howard Heaton, Samy Wu Fung

Indecipherable black boxes are common in machine learning (ML), but applications increasingly require explainable artificial intelligence (XAI). The core of XAI is to establish transparent and interpretable data-driven algorithms. This work provides concrete tools for XAI in situations where prior knowledge must be encoded and untrustworthy inferences flagged. We use the "learn to optimize" (L2O) methodology wherein each inference solves a data-driven optimization problem. Our L2O models are straightforward to implement, directly encode prior knowledge, and yield theoretical guarantees (e.g. satisfaction of constraints). We also propose use of interpretable certificates to verify whether model inferences are trustworthy. Numerical examples are provided in the applications of dictionary-based signal recovery, CT imaging, and arbitrage trading of cryptoassets. Code and additional documentation can be found at https://xai-l2o.research.typal.academy.

OCMar 23, 2021Code
Learning to Optimize: A Primer and A Benchmark

Tianlong Chen, Xiaohan Chen, Wuyang Chen et al.

Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods, aiming at reducing the laborious iterations of hand engineering. It automates the design of an optimization method based on its performance on a set of training problems. This data-driven procedure generates methods that can efficiently solve problems similar to those in the training. In sharp contrast, the typical and traditional designs of optimization methods are theory-driven, so they obtain performance guarantees over the classes of problems specified by the theory. The difference makes L2O suitable for repeatedly solving a certain type of optimization problems over a specific distribution of data, while it typically fails on out-of-distribution problems. The practicality of L2O depends on the type of target optimization, the chosen architecture of the method to learn, and the training procedure. This new paradigm has motivated a community of researchers to explore L2O and report their findings. This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization. We set up taxonomies, categorize existing works and research directions, present insights, and identify open challenges. We also benchmarked many existing L2O approaches on a few but representative optimization problems. For reproducible research and fair benchmarking purposes, we released our software implementation and data in the package Open-L2O at https://github.com/VITA-Group/Open-L2O.

LGJun 2, 2021
Operator Splitting for Learning to Predict Equilibria in Convex Games

Daniel McKenzie, Howard Heaton, Qiuwei Li et al.

Systems of competing agents can often be modeled as games. Assuming rationality, the most likely outcomes are given by an equilibrium (e.g. a Nash equilibrium). In many practical settings, games are influenced by context, i.e. additional data beyond the control of any agent (e.g. weather for traffic and fiscal policy for market economies). Often the exact game mechanics are unknown, yet vast amounts of historical data consisting of (context, equilibrium) pairs are available, raising the possibility of learning a solver which predicts the equilibria given only the context. We introduce Nash Fixed Point Networks (N-FPNs), a class of neural networks that naturally output equilibria. Crucially, N- FPNs employ a constraint decoupling scheme to handle complicated agent action sets while avoiding expensive projections. Empirically, we find N-FPNs are compatible with the recently developed Jacobian-Free Backpropagation technique for training implicit networks, making them significantly faster and easier to train than prior models. Our experiments show N-FPNs are capable of scaling to problems orders of magnitude larger than existing learned game solvers.

LGApr 29, 2021
Feasibility-based Fixed Point Networks

Howard Heaton, Samy Wu Fung, Aviv Gibali et al.

Inverse problems consist of recovering a signal from a collection of noisy measurements. These problems can often be cast as feasibility problems; however, additional regularization is typically necessary to ensure accurate and stable recovery with respect to data perturbations. Hand-chosen analytic regularization can yield desirable theoretical guarantees, but such approaches have limited effectiveness recovering signals due to their inability to leverage large amounts of available data. To this end, this work fuses data-driven regularization and convex feasibility in a theoretically sound manner. This is accomplished using feasibility-based fixed point networks (F-FPNs). Each F-FPN defines a collection of nonexpansive operators, each of which is the composition of a projection-based operator and a data-driven regularization operator. Fixed point iteration is used to compute fixed points of these operators, and weights of the operators are tuned so that the fixed points closely represent available data. Numerical examples demonstrate performance increases by F-FPNs when compared to standard TV-based recovery methods for CT reconstruction and a comparable neural network based on algorithm unrolling.

LGMar 23, 2021
JFB: Jacobian-Free Backpropagation for Implicit Networks

Samy Wu Fung, Howard Heaton, Qiuwei Li et al.

A promising trend in deep learning replaces traditional feedforward networks with implicit networks. Unlike traditional networks, implicit networks solve a fixed point equation to compute inferences. Solving for the fixed point varies in complexity, depending on provided data and an error tolerance. Importantly, implicit networks may be trained with fixed memory costs in stark contrast to feedforward networks, whose memory requirements scale linearly with depth. However, there is no free lunch -- backpropagation through implicit networks often requires solving a costly Jacobian-based equation arising from the implicit function theorem. We propose Jacobian-Free Backpropagation (JFB), a fixed-memory approach that circumvents the need to solve Jacobian-based equations. JFB makes implicit networks faster to train and significantly easier to implement, without sacrificing test accuracy. Our experiments show implicit networks trained with JFB are competitive with feedforward networks and prior implicit networks given the same number of parameters.

LGAug 5, 2020
Wasserstein-based Projections with Applications to Inverse Problems

Howard Heaton, Samy Wu Fung, Alex Tong Lin et al.

Inverse problems consist of recovering a signal from a collection of noisy measurements. These are typically cast as optimization problems, with classic approaches using a data fidelity term and an analytic regularizer that stabilizes recovery. Recent Plug-and-Play (PnP) works propose replacing the operator for analytic regularization in optimization methods by a data-driven denoiser. These schemes obtain state of the art results, but at the cost of limited theoretical guarantees. To bridge this gap, we present a new algorithm that takes samples from the manifold of true data as input and outputs an approximation of the projection operator onto this manifold. Under standard assumptions, we prove this algorithm generates a learned operator, called Wasserstein-based projection (WP), that approximates the true projection with high probability. Thus, WPs can be inserted into optimization methods in the same manner as PnP, but now with theoretical guarantees. Provided numerical examples show WPs obtain state of the art results for unsupervised PnP signal recovery.

OCMar 4, 2020
Safeguarded Learned Convex Optimization

Howard Heaton, Xiaohan Chen, Zhangyang Wang et al.

Applications abound in which optimization problems must be repeatedly solved, each time with new (but similar) data. Analytic optimization algorithms can be hand-designed to provably solve these problems in an iterative fashion. On one hand, data-driven algorithms can "learn to optimize" (L2O) with much fewer iterations and similar cost per iteration as general-purpose optimization algorithms. On the other hand, unfortunately, many L2O algorithms lack converge guarantees. To fuse the advantages of these approaches, we present a Safe-L2O framework. Safe-L2O updates incorporate a safeguard to guarantee convergence for convex problems with proximal and/or gradient oracles. The safeguard is simple and computationally cheap to implement, and it is activated only when the data-driven L2O updates would perform poorly or appear to diverge. This yields the numerical benefits of employing machine learning to create rapid L2O algorithms while still guaranteeing convergence. Our numerical examples show convergence of Safe-L2O algorithms, even when the provided data is not from the distribution of training data.