ML LG EMFeb 20, 2025

Policy-Oriented Binary Classification: Improving (KD-)CART Final Splits for Subpopulation Targeting

Lei Bill Wang, Zhenbang Jiao, Fangyi Wang

arXiv:2502.15072v24.5h-index: 1

Originality Incremental advance

AI Analysis

This work addresses a specific problem for policymakers using binary classification trees, offering incremental improvements over existing methods.

The paper tackles the suboptimality of CART and KD-CART for Latent Probability Classification in policy targeting, proposing MDFS and other methods that strictly dominate them under certain assumptions, leading to policies that target more vulnerable subpopulations in real-world datasets.

Policymakers often use recursive binary split rules to partition populations based on binary outcomes and target subpopulations whose probability of the binary event exceeds a threshold. We call such problems Latent Probability Classification (LPC). Practitioners typically employ Classification and Regression Trees (CART) for LPC. We prove that in the context of LPC, classic CART and the knowledge distillation method, whose student model is a CART (referred to as KD-CART), are suboptimal. We propose Maximizing Distance Final Split (MDFS), which generates split rules that strictly dominate CART/KD-CART under the unique intersect assumption. MDFS identifies the unique best split rule, is consistent, and targets more vulnerable subpopulations than CART/KD-CART. To relax the unique intersect assumption, we additionally propose Penalized Final Split (PFS) and weighted Empirical risk Final Split (wEFS). Through extensive simulation studies, we demonstrate that the proposed methods predominantly outperform CART/KD-CART. When applied to real-world datasets, MDFS generates policies that target more vulnerable subpopulations than the CART/KD-CART.

View on arXiv PDF

Similar