LGJun 28, 2017

CatBoost: unbiased boosting with categorical features

arXiv:1706.09516v5263 citations
Originality Highly original
AI Analysis

This addresses a fundamental issue in gradient boosting for machine learning practitioners, offering a novel solution with broad empirical improvements.

The paper tackles prediction shift in gradient boosting caused by target leakage, introducing CatBoost with ordered boosting and categorical feature processing to outperform other boosting implementations on various datasets.

This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available boosting implementations in terms of quality on a variety of datasets. Two critical algorithmic advances introduced in CatBoost are the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms. In this paper, we provide a detailed analysis of this problem and demonstrate that proposed algorithms solve it effectively, leading to excellent empirical results.

Code Implementations10 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes