Hongyu Cheng

LG
h-index6
4papers
34citations
Novelty53%
AI Score48

4 Papers

LGMay 8
Sample Complexity of Stochastic Optimization with Integer Variables

Hongyu Cheng, Yinghao Zheng, Marco Molinaro et al.

We establish sample complexity results for stochastic optimization over the integers, especially with a view to understand the complexity with respect to the corresponding continuous optimization problem. We show that integer optimization can sometimes require strictly more samples and sometimes strictly smaller number of samples, depending on the structure of the objective and constraints. 1. For Lipschitz objectives over subsets of the $\ell_\infty$ ball, the statistical complexity of general stochastic mixed-integer, nonlinear, nonconvex optimization is exactly the same as stochastic linear optimization with just bound constraints. 2. For Lipschitz objectives over subsets of the $\ell_2$ ball, we show that integer optimization can require strictly *smaller* sample size compared to the continuous setting in a certain regime. To get to this result, we also establish tight sample complexity results for nonconvex continuous stochastic optimization which, to the best of our knowledge, do not appear in prior work. 3. For strongly convex, smooth objectives, integer optimization has high statistical complexity compared to the continuous setting. In particular, we show that integer optimization requires $Ω(1/ε^2)$ samples to report an $ε$-approximate solution, compared to the well-known $O(1/ε)$ sample complexity from the continuous optimization literature.

LGFeb 4, 2024
Sample Complexity of Algorithm Selection Using Neural Networks and Its Applications to Branch-and-Cut

Hongyu Cheng, Sammy Khalife, Barbara Fiedorowicz et al.

Data-driven algorithm design is a paradigm that uses statistical and machine learning techniques to select from a class of algorithms for a computational problem an algorithm that has the best expected performance with respect to some (unknown) distribution on the instances of the problem. We build upon recent work in this line of research by considering the setup where, instead of selecting a single algorithm that has the best performance, we allow the possibility of selecting an algorithm based on the instance to be solved, using neural networks. In particular, given a representative sample of instances, we learn a neural network that maps an instance of the problem to the most appropriate algorithm for that instance. We formalize this idea and derive rigorous sample complexity bounds for this learning problem, in the spirit of recent work in data-driven algorithm design. We then apply this approach to the problem of making good decisions in the branch-and-cut framework for mixed-integer optimization (e.g., which cut to add?). In other words, the neural network will take as input a mixed-integer optimization instance and output a decision that will result in a small branch-and-cut tree for that instance. Our computational results provide evidence that our particular way of using neural networks for cut selection can make a significant impact in reducing branch-and-cut tree sizes, compared to previous data-driven approaches.

OCMay 22, 2024
Learning Cut Generating Functions for Integer Programming

Hongyu Cheng, Amitabh Basu

The branch-and-cut algorithm is the method of choice to solve large scale integer programming problems in practice. A key ingredient of branch-and-cut is the use of cutting planes which are derived constraints that reduce the search space for an optimal solution. Selecting effective cutting planes to produce small branch-and-cut trees is a critical challenge in the branch-and-cut algorithm. Recent advances have employed a data-driven approach to select optimal cutting planes from a parameterized family, aimed at reducing the branch-and-bound tree size (in expectation) for a given distribution of integer programming instances. We extend this idea to the selection of the best cut generating function (CGF), which is a tool in the integer programming literature for generating a wide variety of cutting planes that generalize the well-known Gomory Mixed-Integer (GMI) cutting planes. We provide rigorous sample complexity bounds for the selection of an effective CGF from certain parameterized families that provably performs well for any specified distribution on the problem instances. Our empirical results show that the selected CGF can outperform the GMI cuts for certain distributions. Additionally, we explore the sample complexity of using neural networks for instance-dependent CGF selection.

LGNov 15, 2021
Neural networks with linear threshold activations: structure and algorithms

Sammy Khalife, Hongyu Cheng, Amitabh Basu

In this article we present new results on neural networks with linear threshold activation functions. We precisely characterize the class of functions that are representable by such neural networks and show that 2 hidden layers are necessary and sufficient to represent any function representable in the class. This is a surprising result in the light of recent exact representability investigations for neural networks using other popular activation functions like rectified linear units (ReLU). We also give precise bounds on the sizes of the neural networks required to represent any function in the class. Finally, we design an algorithm to solve the empirical risk minimization (ERM) problem to global optimality for these neural networks with a fixed architecture. The algorithm's running time is polynomial in the size of the data sample, if the input dimension and the size of the network architecture are considered fixed constants. The algorithm is unique in the sense that it works for any architecture with any number of layers, whereas previous polynomial time globally optimal algorithms work only for very restricted classes of architectures. Using these insights, we propose a new class of neural networks that we call shortcut linear threshold networks. To the best of our knowledge, this way of designing neural networks has not been explored before in the literature. We show that these neural networks have several desirable theoretical properties.