Contrast Trees and Distribution Boosting
This addresses the need for veracity in ML outputs for decision-making, offering a novel approach for accuracy assessment in non-standard cases.
The paper tackles the problem of assessing the accuracy of machine learning estimates when standard validation methods are not applicable, introducing contrast trees to detect inaccuracies and using boosted contrast trees to improve performance, with distribution boosting enabling assumption-free estimation of full probability distributions.
Often machine learning methods are applied and results reported in cases where there is little to no information concerning accuracy of the output. Simply because a computer program returns a result does not insure its validity. If decisions are to be made based on such results it is important to have some notion of their veracity. Contrast trees represent a new approach for assessing the accuracy of many types of machine learning estimates that are not amenable to standard (cross) validation methods. In situations where inaccuracies are detected boosted contrast trees can often improve performance. A special case, distribution boosting, provides an assumption free method for estimating the full probability distribution of an outcome variable given any set of joint input predictor variable values.