Classification Error Bound for Low Bayes Error Conditions in Machine Learning
This work provides theoretical insights into error bounds for machine learning practitioners, particularly in domains like speech recognition, but is incremental as it builds on existing error bound analyses.
The paper tackles the mismatch between Bayes error and model-based classification error by proposing a linear approximation of classification error bounds for low Bayes error conditions, and extends these bounds to sequences and correlates them with performance measures like cross-entropy loss and word error rate in automatic speech recognition.
In statistical classification and machine learning, classification error is an important performance measure, which is minimized by the Bayes decision rule. In practice, the unknown true distribution is usually replaced with a model distribution estimated from the training data in the Bayes decision rule. This substitution introduces a mismatch between the Bayes error and the model-based classification error. In this work, we apply classification error bounds to study the relationship between the error mismatch and the Kullback-Leibler divergence in machine learning. Motivated by recent observations of low model-based classification errors in many machine learning tasks, bounding the Bayes error to be lower, we propose a linear approximation of the classification error bound for low Bayes error conditions. Then, the bound for class priors are discussed. Moreover, we extend the classification error bound for sequences. Using automatic speech recognition as a representative example of machine learning applications, this work analytically discusses the correlations among different performance measures with extended bounds, including cross-entropy loss, language model perplexity, and word error rate.