Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting
This addresses the challenge of efficient and effective oblique decision tree learning for machine learning practitioners, offering a novel optimization approach with proven convergence and approximation guarantees.
The paper tackled the NP-hard problem of learning high-quality oblique splits in decision trees by introducing the Hinge Regression Tree (HRT), which reframes splits as a non-linear least-squares problem and uses a damped Newton method, resulting in fast convergence and matching or outperforming baselines with more compact structures.
Oblique decision trees combine the transparency of trees with the power of multivariate decision boundaries, but learning high-quality oblique splits is NP-hard, and practical methods still rely on slow search or theory-free heuristics. We present the Hinge Regression Tree (HRT), which reframes each split as a non-linear least-squares problem over two linear predictors whose max/min envelope induces ReLU-like expressive power. The resulting alternating fitting procedure is exactly equivalent to a damped Newton (Gauss-Newton) method within fixed partitions. We analyze this node-level optimization and, for a backtracking line-search variant, prove that the local objective decreases monotonically and converges; in practice, both fixed and adaptive damping yield fast, stable convergence and can be combined with optional ridge regularization. We further prove that HRT's model class is a universal approximator with an explicit $O(δ^2)$ approximation rate, and show on synthetic and real-world benchmarks that it matches or outperforms single-tree baselines with more compact structures.