LGJul 14, 2022

problexity -- an open-source Python library for binary classification problem complexity assessment

arXiv:2207.06709v14 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This provides a practical tool for researchers and practitioners in machine learning to analyze dataset complexity, though it is incremental as it ports existing measures to Python.

The authors tackled the lack of Python tools for assessing binary classification problem complexity by developing an open-source library that estimates 22 complexity measures, compatible with scikit-learn, to facilitate research in the machine learning community.

The classification problem's complexity assessment is an essential element of many topics in the supervised learning domain. It plays a significant role in meta-learning -- becoming the basis for determining meta-attributes or multi-criteria optimization -- allowing the evaluation of the training set resampling without needing to rebuild the recognition model. The tools currently available for the academic community, which would enable the calculation of problem complexity measures, are available only as libraries of the C++ and R languages. This paper describes the software module that allows for the estimation of 22 complexity measures for the Python language -- compatible with the scikit-learn programming interface -- allowing for the implementation of research using them in the most popular programming environment of the machine learning community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes