Empirically Estimable Classification Bounds Based on a New Divergence Measure
This provides incremental improvements to classification bounds for researchers in statistics and machine learning, with specific applications in speech pathology.
The paper tackles the problem of bounding binary classification error probability under both matched and mismatched training/test distributions by introducing a new non-parametric f-divergence measure, achieving improved theoretical bounds that are validated through pathological speech classification tasks.
Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks.