From Sequential Nodes to GPU Batches: Parallel Branch and Bound for Optimal $k$-Sparse GLMs

arXiv:2605.2218860.5

AI Analysis

It addresses the bottleneck of solving discrete optimization problems with nonlinear objectives, enabling certifiable optimal solutions for cardinality-constrained GLMs, which is important for high-stakes applications requiring interpretable models.

The paper introduces a CPU-GPU framework that parallelizes branch and bound for optimal k-sparse GLMs by processing multiple nodes in batches on GPUs, achieving one to two orders of magnitude speedups and zero optimality gap on challenging instances.

GPUs have significantly accelerated first-order methods for large-scale optimization, especially in continuous optimization. However, this success has not transferred cleanly to problems with discrete variables, combinatorial structure, and nonlinear objectives, such as certifying optimal solutions for cardinality-constrained generalized linear models. Major challenges include the sequential processing of heterogeneous nodes in branch and bound (BnB) and frequent data movement between the CPU and GPU. We propose a simple, generic, and modular CPU--GPU framework that processes multiple BnB nodes in batches on GPUs. The framework is built around a small set of GPU-efficient routines and uses padding together with lightweight custom kernels to handle irregular node data structures. Experiments show one to two orders of magnitude speedups and zero optimality gap on challenging instances. The framework can also be extended to collect the entire Rashomon set, enabling downstream statistical analysis such as variable-importance analysis and model selection under secondary user-specific measures (e.g., AUC in classification).

View on arXiv PDF

Similar