Average-Case Information Complexity of Learning
This addresses information complexity in machine learning, providing average-case bounds that are incremental improvements over prior worst-case results.
The paper tackles the problem of quantifying information leakage in learning algorithms for concept classes with VC-dimension d, showing that while worst-case leakage can be unbounded, most concepts require only O(d) bits of information, and it generalizes this to learners that do not need to know the input distribution.
How many bits of information are revealed by a learning algorithm for a concept class of VC-dimension $d$? Previous works have shown that even for $d=1$ the amount of information may be unbounded (tend to $\infty$ with the universe size). Can it be that all concepts in the class require leaking a large amount of information? We show that typically concepts do not require leakage. There exists a proper learning algorithm that reveals $O(d)$ bits of information for most concepts in the class. This result is a special case of a more general phenomenon we explore. If there is a low information learner when the algorithm {\em knows} the underlying distribution on inputs, then there is a learner that reveals little information on an average concept {\em without knowing} the distribution on inputs.