LGSTAPOct 6, 2021

The Variability of Model Specification

arXiv:2110.02490v1
Originality Synthesis-oriented
AI Analysis

This addresses the problem of controlling model variance for practitioners in statistics and machine learning, but it is incremental as it builds on existing bias-variance trade-off concepts.

The paper investigates how misspecifying regression models, such as generalized linear models and Cox proportional hazard models, affects model variance, showing that complexity increases variance even when training cost is minimized.

It's regarded as an axiom that a good model is one that compromises between bias and variance. The bias is measured in training cost, while the variance of a (say, regression) model is measure by the cost associated with a validation set. If reducing bias is the goal, one will strive to fetch as complex a model as necessary, but complexity is invariably coupled with variance: greater complexity implies greater variance. In practice, driving training cost to near zero does not pose a fundamental problem; in fact, a sufficiently complex decision tree is perfectly capable of driving training cost to zero; however, the problem is often with controlling the model's variance. We investigate various regression model frameworks, including generalized linear models, Cox proportional hazard models, ARMA, and illustrate how misspecifying a model affects the variance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes