LG ST APOct 6, 2021

The Variability of Model Specification

Joseph R. Barr, Peter Shaw, Marcus Sobel

arXiv:2110.02490v11.6

Originality Synthesis-oriented

AI Analysis

This addresses the problem of controlling model variance for practitioners in statistics and machine learning, but it is incremental as it builds on existing bias-variance trade-off concepts.

The paper investigates how misspecifying regression models, such as generalized linear models and Cox proportional hazard models, affects model variance, showing that complexity increases variance even when training cost is minimized.

It's regarded as an axiom that a good model is one that compromises between bias and variance. The bias is measured in training cost, while the variance of a (say, regression) model is measure by the cost associated with a validation set. If reducing bias is the goal, one will strive to fetch as complex a model as necessary, but complexity is invariably coupled with variance: greater complexity implies greater variance. In practice, driving training cost to near zero does not pose a fundamental problem; in fact, a sufficiently complex decision tree is perfectly capable of driving training cost to zero; however, the problem is often with controlling the model's variance. We investigate various regression model frameworks, including generalized linear models, Cox proportional hazard models, ARMA, and illustrate how misspecifying a model affects the variance.

View on arXiv PDF

Similar