LGOCMLJan 22, 2020

Optimal binning: mathematical programming formulation

arXiv:2001.08025v336 citationsHas Code
Originality Incremental advance
AI Analysis

This work provides a rigorous and extensible solution for data preprocessing in machine learning, though it is incremental as it builds on existing binning methods with added constraints and algorithmic enhancements.

The authors tackled the problem of optimal binning for variable discretization across binary, continuous, and multi-class targets by introducing a convex mixed-integer programming formulation with new constraints, resulting in an open-source Python library called OptBinning.

The optimal binning is the optimal discretization of a variable into bins given a discrete or continuous numeric target. We present a rigorous and extensible mathematical programming formulation for solving the optimal binning problem for a binary, continuous and multi-class target type, incorporating constraints not previously addressed. For all three target types, we introduce a convex mixed-integer programming formulation. Several algorithmic enhancements, such as automatic determination of the most suitable monotonic trend via a Machine-Learning-based classifier and implementation aspects are thoughtfully discussed. The new mathematical programming formulations are carefully implemented in the open-source python library OptBinning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes