Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
This toolbox helps machine learning practitioners tackle the common issue of imbalanced datasets, but it is incremental as it packages existing methods into a user-friendly tool.
The paper introduces Imbalanced-learn, a Python toolbox that provides a wide range of state-of-the-art methods to address the problem of imbalanced datasets in machine learning, including under-sampling, over-sampling, combinations, and ensemble learning, and it is fully compatible with scikit-learn and distributed under an MIT license.
Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented state-of-the-art methods can be categorized into 4 groups: (i) under-sampling, (ii) over-sampling, (iii) combination of over- and under-sampling, and (iv) ensemble learning methods. The proposed toolbox only depends on numpy, scipy, and scikit-learn and is distributed under MIT license. Furthermore, it is fully compatible with scikit-learn and is part of the scikit-learn-contrib supported project. Documentation, unit tests as well as integration tests are provided to ease usage and contribution. The toolbox is publicly available in GitHub: https://github.com/scikit-learn-contrib/imbalanced-learn.