Statistically Valid Information Bottleneck via Multiple Hypothesis Testing
This provides a statistically valid solution for machine learning practitioners using IB frameworks, addressing a key limitation in hyperparameter tuning, though it is incremental as it builds on existing IB solvers.
The paper tackles the lack of statistical guarantees in information bottleneck (IB) methods by introducing IB-MHT, which ensures IB constraints are met with high probability regardless of dataset size, and demonstrates its effectiveness in outperforming conventional methods in robustness and reliability.
The information bottleneck (IB) problem is a widely studied framework in machine learning for extracting compressed features that are informative for downstream tasks. However, current approaches to solving the IB problem rely on a heuristic tuning of hyperparameters, offering no guarantees that the learned features satisfy information-theoretic constraints. In this work, we introduce a statistically valid solution to this problem, referred to as IB via multiple hypothesis testing (IB-MHT), which ensures that the learned features meet the IB constraints with high probability, regardless of the size of the available dataset. The proposed methodology builds on Pareto testing and learn-then-test (LTT), and it wraps around existing IB solvers to provide statistical guarantees on the IB constraints. We demonstrate the performance of IB-MHT on classical and deterministic IB formulations, including experiments on distillation of language models. The results validate the effectiveness of IB-MHT in outperforming conventional methods in terms of statistical robustness and reliability.