On the Robustness of Decision Tree Learning under Label Noise
This addresses the problem of label noise in classifier learning for practitioners, but it is incremental as it builds on existing theoretical analysis.
The paper analyzes the robustness of popular decision tree algorithms to symmetric label noise, showing they are robust under large sample sizes and providing sample complexity bounds.
In most practical problems of classifier learning, the training data suffers from the label noise. Hence, it is important to understand how robust is a learning algorithm to such label noise. This paper presents some theoretical analysis to show that many popular decision tree algorithms are robust to symmetric label noise under large sample size. We also present some sample complexity results which provide some bounds on the sample size for the robustness to hold with a high probability. Through extensive simulations we illustrate this robustness.