13.4MLJun 1
ScoreStop: Gradient-based early stopping using functional score testsOliver J. Hines, Christian L. Hines
Gradient boosted decision trees require a stopping rule to avoid overfitting. The standard rule monitors a validation loss and stops if the loss fails to improve for a fixed patience period. However, the patience parameter has no interpretable scale and validation losses can be noisy or implicitly defined by a user-specified gradient. We propose ScoreStop, a gradient-based early-stopping rule that casts the stopping decision at each iteration as a test of the null hypothesis that the current predictor is the population risk minimizer. We use a functional score test, computed on validation data, with a statistic that is scale-invariant in the update direction, with a known asymptotic distribution under the null. Because our test uses gradients rather than loss values, the same construction applies to implicit losses such as LambdaRank, and data-dependent losses such as Cox regression via influence functions. In synthetic experiments and real-data benchmarks, we show that ScoreStop is competitive with loss-based methods.
MLSep 29, 2024
Automatic debiasing of neural networks via moment-constrained learningChristian L. Hines, Oliver J. Hines
Causal and nonparametric estimands in economics and biostatistics can often be viewed as the mean of a linear functional applied to an unknown outcome regression function. Naively learning the regression function and taking a sample mean of the target functional results in biased estimators, and a rich debiasing literature has developed where one additionally learns the so-called Riesz representer (RR) of the target estimand (targeted learning, double ML, automatic debiasing etc.). Learning the RR via its derived functional form can be challenging, e.g. due to extreme inverse probability weights or the need to learn conditional density functions. Such challenges have motivated recent advances in automatic debiasing (AD), where the RR is learned directly via minimization of a bespoke loss. We propose moment-constrained learning as a new RR learning approach that addresses some shortcomings in AD, constraining the predicted moments and improving the robustness of RR estimates to optimization hyperparamters. Though our approach is not tied to a particular class of learner, we illustrate it using neural networks, and evaluate on the problems of average treatment/derivative effect estimation using semi-synthetic data. Our numerical experiments show improved performance versus state of the art benchmarks.
LGSep 16, 2025
TimeCluster with PCA is Equivalent to Subspace Identification of Linear Dynamical SystemsChristian L. Hines, Samuel Spillard, Daniel P. Martin
TimeCluster is a visual analytics technique for discovering structure in long multivariate time series by projecting overlapping windows of data into a low-dimensional space. We show that, when Principal Component Analysis (PCA) is chosen as the dimensionality reduction technique, this procedure is mathematically equivalent to classical linear subspace identification (block-Hankel matrix plus Singular Vector Decomposition (SVD)). In both approaches, the same low-dimensional linear subspace is extracted from the time series data. We first review the TimeCluster method and the theory of subspace system identification. Then we show that forming the sliding-window matrix of a time series yields a Hankel matrix, so applying PCA (via SVD) to this matrix recovers the same principal directions as subspace identification. Thus the cluster coordinates from TimeCluster coincide with the subspace identification methods. We present experiments on synthetic and real dynamical signals confirming that the two embeddings coincide. Finally, we explore and discuss future opportunities enabled by this equivalence, including forecasting from the identified state space, streaming/online extensions, incorporating and visualising external inputs and robust techniques for displaying underlying trends in corrupted data.