LGSep 21, 2016

Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning

Guillaume Lemaitre, Fernando Nogueira, Christos K. Aridas

arXiv:1609.06570v12481 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This toolbox helps machine learning practitioners tackle the common issue of imbalanced datasets, but it is incremental as it packages existing methods into a user-friendly tool.

The paper introduces Imbalanced-learn, a Python toolbox that provides a wide range of state-of-the-art methods to address the problem of imbalanced datasets in machine learning, including under-sampling, over-sampling, combinations, and ensemble learning, and it is fully compatible with scikit-learn and distributed under an MIT license.

Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented state-of-the-art methods can be categorized into 4 groups: (i) under-sampling, (ii) over-sampling, (iii) combination of over- and under-sampling, and (iv) ensemble learning methods. The proposed toolbox only depends on numpy, scipy, and scikit-learn and is distributed under MIT license. Furthermore, it is fully compatible with scikit-learn and is part of the scikit-learn-contrib supported project. Documentation, unit tests as well as integration tests are provided to ease usage and contribution. The toolbox is publicly available in GitHub: https://github.com/scikit-learn-contrib/imbalanced-learn.

View on arXiv PDF Code

Similar