STITMLJun 15, 2012

On the Cover-Hart Inequality: What's a Sample of Size One Worth?

arXiv:1206.3381v11 citations
Originality Incremental advance
AI Analysis

This provides a theoretical explanation for the small performance gains often observed in applied classification and forecasting, as well as the effectiveness of analogy-based methods like nearest neighbors, though it is incremental in building on existing inequality theory.

The paper tackles the problem of quantifying the advantage of having a larger sample size for prediction, showing that under a broad class of loss functions (the Cover-Hart family), the best possible improvement over a predictor with a sample of size one is to halve the risk, implying that half the information from an infinite sample is contained in just one sample.

Bob predicts a future observation based on a sample of size one. Alice can draw a sample of any size before issuing her prediction. How much better can she do than Bob? Perhaps surprisingly, under a large class of loss functions, which we refer to as the Cover-Hart family, the best Alice can do is to halve Bob's risk. In this sense, half the information in an infinite sample is contained in a sample of size one. The Cover-Hart family is a convex cone that includes metrics and negative definite functions, subject to slight regularity conditions. These results may help explain the small relative differences in empirical performance measures in applied classification and forecasting problems, as well as the success of reasoning and learning by analogy in general, and nearest neighbor techniques in particular.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes