Minimum Encoding Approaches for Predictive Modeling
This work addresses model selection challenges for statisticians and machine learning practitioners, but it is incremental as it refines existing MML approaches.
The paper tackled the problem of comparing and improving Minimum Message Length (MML) estimators for statistical inference and model selection, finding that Minimum Description Length (MDL) yields more accurate predictions with small datasets and that revised MML estimators outperform the original.
We analyze differences between two information-theoretically motivated approaches to statistical inference and model selection: the Minimum Description Length (MDL) principle, and the Minimum Message Length (MML) principle. Based on this analysis, we present two revised versions of MML: a pointwise estimator which gives the MML-optimal single parameter model, and a volumewise estimator which gives the MML-optimal region in the parameter space. Our empirical results suggest that with small data sets, the MDL approach yields more accurate predictions than the MML estimators. The empirical results also demonstrate that the revised MML estimators introduced here perform better than the original MML estimator suggested by Wallace and Freeman.