A novel Information-Driven Strategy for Optimal Regression Assessment
This provides a principled alternative to conventional metrics for assessing regression models, addressing a fundamental challenge in machine learning evaluation.
The paper tackles the problem of evaluating regression algorithms without access to the true data-generating mechanism by introducing the Information Teacher, a data-driven framework that uses Shannon mutual information to assess global optimality with formal guarantees, and numerical experiments confirm its ability to detect such optimality as a surrogate for ground truth loss.
In Machine Learning (ML), a regression algorithm aims to minimize a loss function based on data. An assessment method in this context seeks to quantify the discrepancy between the optimal response for an input-output system and the estimate produced by a learned predictive model (the student). Evaluating the quality of a learned regressor remains challenging without access to the true data-generating mechanism, as no data-driven assessment method can ensure the achievability of global optimality. This work introduces the Information Teacher, a novel data-driven framework for evaluating regression algorithms with formal performance guarantees to assess global optimality. Our novel approach builds on estimating the Shannon mutual information (MI) between the input variables and the residuals and applies to a broad class of additive noise models. Through numerical experiments, we confirm that the Information Teacher is capable of detecting global optimality, which is aligned with the condition of zero estimation error with respect to the -- inaccessible, in practice -- true model, working as a surrogate measure of the ground truth assessment loss and offering a principled alternative to conventional empirical performance metrics.