LGMLSep 15, 2024

Model Selection Through Model Sorting

arXiv:2409.09674v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses model selection for machine learning practitioners by providing a theoretically grounded method, though it appears incremental as it builds on nested models and empirical risk concepts.

The paper tackles model selection by proposing a method to find the most parsimonious model containing the risk minimizer predictor, proving PAC bounds on successive empirical excess risk. It shows that the S-NER method outperforms feature sorting algorithms like OMP in linear regression without prior knowledge and reduces complexity with negligible accuracy loss in UCR datasets.

We propose a novel approach to select the best model of the data. Based on the exclusive properties of the nested models, we find the most parsimonious model containing the risk minimizer predictor. We prove the existence of probable approximately correct (PAC) bounds on the difference of the minimum empirical risk of two successive nested models, called successive empirical excess risk (SEER). Based on these bounds, we propose a model order selection method called nested empirical risk (NER). By the sorted NER (S-NER) method to sort the models intelligently, the minimum risk decreases. We construct a test that predicts whether expanding the model decreases the minimum risk or not. With a high probability, the NER and S-NER choose the true model order and the most parsimonious model containing the risk minimizer predictor, respectively. We use S-NER model selection in the linear regression and show that, the S-NER method without any prior information can outperform the accuracy of feature sorting algorithms like orthogonal matching pursuit (OMP) that aided with prior knowledge of the true model order. Also, in the UCR data set, the NER method reduces the complexity of the classification of UCR datasets dramatically, with a negligible loss of accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes