Revisiting Chebyshev Polynomial and Anisotropic RBF Models for Tabular Regression
This work addresses the underuse of smooth models in tabular regression, offering CPU-viable alternatives with better generalization for applications like surrogate optimization, though it is incremental in improving existing methods.
The study evaluated smooth-basis models like Chebyshev polynomials and anisotropic RBF networks for tabular regression, benchmarking them against tree ensembles and a transformer across 55 datasets, finding that smooth models and tree ensembles are statistically tied on accuracy but smooth models show tighter generalization gaps.
Smooth-basis models such as Chebyshev polynomial regressors and radial basis function (RBF) networks are well established in numerical analysis. Their continuously differentiable prediction surfaces suit surrogate optimisation, sensitivity analysis, and other settings where the response varies gradually with inputs. Despite these properties, smooth models seldom appear in tabular regression, where tree ensembles dominate. We ask whether they can compete, benchmarking models across 55 regression datasets organised by application domain. We develop an anisotropic RBF network with data-driven centre placement and gradient-based width optimisation, a ridge-regularised Chebyshev polynomial regressor, and a smooth-tree hybrid (Chebyshev model tree); all three are released as scikit-learn-compatible packages. We benchmark these against tree ensembles, a pre-trained transformer, and standard baselines, evaluating accuracy alongside generalisation behaviour. The transformer ranks first on accuracy across a majority of datasets, but its GPU dependence, inference latency, and dataset-size limits constrain deployment in the CPU-based settings common across applied science and industry. Among CPU-viable models, smooth models and tree ensembles are statistically tied on accuracy, but the former tend to exhibit tighter generalisation gaps. We recommend routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalisation and gradually varying predictions.