A Non-Intrusive Correction Algorithm for Classification Problems with Corrupted Data
This addresses the problem of data corruption in classification for machine learning practitioners, offering a flexible solution that can be applied to various models, though it appears incremental as it builds on existing correction methods.
The paper tackles multi-class classification with corrupted training data by proposing a non-intrusive correction algorithm that post-processes trained models, proving it delivers correct results as if data were uncorrupted for large datasets and showing significantly better recovery for finite datasets.
A novel correction algorithm is proposed for multi-class classification problems with corrupted training data. The algorithm is non-intrusive, in the sense that it post-processes a trained classification model by adding a correction procedure to the model prediction. The correction procedure can be coupled with any approximators, such as logistic regression, neural networks of various architectures, etc. When training dataset is sufficiently large, we prove that the corrected models deliver correct classification results as if there is no corruption in the training data. For datasets of finite size, the corrected models produce significantly better recovery results, compared to the models without the correction algorithm. All of the theoretical findings in the paper are verified by our numerical examples.