The Pivotal Information Criterion
This addresses model selection for practitioners in statistics and machine learning, offering an incremental improvement over existing criteria.
The paper tackles the problem of model selection criteria like BIC and AIC having too small penalty parameters and discrete optimization issues, leading to false discoveries and infeasibility in high dimensions; it introduces the Pivotal Information Criterion (PIC), which uses a continuous optimization and a penalty parameter selected at the detection boundary, resulting in simulations showing a phase transition in exact support recovery and selecting the least complex model on real data with similar predictive performance.
The Bayesian and Akaike information criteria aim at finding a good balance between under- and over-fitting. They are extensively used every day by practitioners. Yet we contend they suffer from at least two afflictions: their penalty parameter $λ=\log n$ and $λ=2$ are too small, leading to many false discoveries, and their inherent (best subset) discrete optimization is infeasible in high dimension. We alleviate these issues with the pivotal information criterion: PIC is defined as a continuous optimization problem, and the PIC penalty parameter $λ$ is selected at the detection boundary (under pure noise). PIC's choice of $λ$ is the quantile of a statistic that we prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed. As a result, simulations show a phase transition in the probability of exact support recovery with PIC, a phenomenon studied with no noise in compressed sensing. Applied on real data, for similar predictive performances, PIC selects the least complex model among state-of-the-art learners.