Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion
This work provides improved theoretical guarantees for generalization in machine learning, particularly for VC classes, though it is incremental in nature.
The paper tackles the problem of deriving tighter generalization bounds for VC classes by introducing a hypergeometric tail inversion and optimizing the ghost sample trick, resulting in bounds that are nearly never vacuous and consistently tighter than existing ones for reasonable dataset sizes.
We significantly improve the generalization bounds for VC classes by using two main ideas. First, we consider the hypergeometric tail inversion to obtain a very tight non-uniform distribution-independent risk upper bound for VC classes. Second, we optimize the ghost sample trick to obtain a further non-negligible gain. These improvements are then used to derive a relative deviation bound, a multiclass margin bound, as well as a lower bound. Numerical comparisons show that the new bound is nearly never vacuous, and is tighter than other VC bounds for all reasonable data set sizes.