Reweighting Improves Conditional Risk Bounds
This work addresses the need for better risk bounds in machine learning for scenarios with imbalanced or heteroscedastic data, though it appears incremental as it builds on existing weighted ERM frameworks.
The paper tackles the problem of improving performance in specific data sub-regions by using weighted empirical risk minimization, showing that this approach achieves superior error bounds in large-margin classification and low-variance regression settings, as supported by synthetic data experiments.
In this work, we study the weighted empirical risk minimization (weighted ERM) schema, in which an additional data-dependent weight function is incorporated when the empirical risk function is being minimized. We show that under a general ``balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound. These sub-regions correspond to large-margin ones in classification settings and low-variance ones in heteroscedastic regression settings, respectively. Our findings are supported by evidence from synthetic data experiments.