Boost Like a (Var)Pro: Trust-Region Gradient Boosting via Variable Projection

Abhijit Chowdhary, Elizabeth Newman, Deepanshu Verma

arXiv:2603.2365840.8h-index: 8

AI Analysis

This work addresses a gap in boosting methods for smooth learners, offering a theoretically grounded approach with practical improvements for machine learning applications.

The paper tackled the problem of developing gradient boosting for smooth parametric learners like neural networks, which are less developed than tree-based methods, by introducing VPBoost, a variable projection-based algorithm that enforces optimal linear weights and uses second-order weak learning. The result is a method that converges to a stationary point with superlinear rates under certain conditions and shows improved evaluation metrics on benchmarks compared to gradient-descent-based boosting, achieving competitive performance with standard tree boosting.

Gradient boosting, a method of building additive ensembles from weak learners, has established itself as a practical and theoretically-motivated approach to approximate functions, especially using decision tree weak learners. Comparable methods for smooth parametric learners, such as neural networks, remain less developed in both training methodology and theory. To this end, we introduce \texttt{VPBoost} ({\bf V}ariable {\bf P}rojection {\bf Boost}ing), a gradient boosting algorithm for separable smooth approximators, i.e., models with a smooth nonlinear featurizer followed by a final linear mapping. \texttt{VPBoost} fuses variable projection, a training paradigm for separable models that enforces optimality of the linear weights, with a second-order weak learning strategy. The combination of second-order boosting, separable models, and variable projection give rise to a closed-form solution for the optimal linear weights and a natural interpretation of \VPBoost as a functional trust-region method. We thereby leverage trust-region theory to prove \VPBoost converges to a stationary point under mild geometric conditions and, under stronger assumptions, achieves a superlinear convergence rate. Comprehensive numerical experiments on synthetic data, image recognition, and scientific machine learning benchmarks demonstrate that \VPBoost learns an ensemble with improved evaluation metrics in comparison to gradient-descent-based boosting and attains competitive performance relative to an industry-standard decision tree boosting algorithm.

View on arXiv PDF

Similar