Automated Imbalanced Learning
This addresses the issue of imbalanced data in AutoML for practitioners dealing with real-world datasets, but it is incremental as it builds on existing frameworks.
The paper tackled the problem of AutoML methods struggling with imbalanced data by introducing a new benchmark, proposing strategies to handle imbalance, and integrating them into an existing AutoML framework, finding that these strategies significantly increase robustness against label imbalance.
Automated Machine Learning has grown very successful in automating the time-consuming, iterative tasks of machine learning model development. However, current methods struggle when the data is imbalanced. Since many real-world datasets are naturally imbalanced, and improper handling of this issue can lead to quite useless models, this issue should be handled carefully. This paper first introduces a new benchmark to study how different AutoML methods are affected by label imbalance. Second, we propose strategies to better deal with imbalance and integrate them into an existing AutoML framework. Finally, we present a systematic study which evaluates the impact of these strategies and find that their inclusion in AutoML systems significantly increases their robustness against label imbalance.