W2WNet: a two-module probabilistic Convolutional Neural Network with embedded data cleansing functionality
This addresses the issue of noisy datasets for image classification tasks, particularly in real-world scenarios like medical imaging, but is incremental as it builds on existing CNN and Bayesian methods.
The paper tackles the problem of image degradation and mislabelling in datasets, which harms CNN performance, by proposing W2WNet, a two-module CNN that identifies and discards spurious images during training and provides confidence at inference, resulting in improved classification accuracy on public benchmarks and a real-world case study.
Convolutional Neural Networks (CNNs) are supposed to be fed with only high-quality annotated datasets. Nonetheless, in many real-world scenarios, such high quality is very hard to obtain, and datasets may be affected by any sort of image degradation and mislabelling issues. This negatively impacts the performance of standard CNNs, both during the training and the inference phase. To address this issue we propose Wise2WipedNet (W2WNet), a new two-module Convolutional Neural Network, where a Wise module exploits Bayesian inference to identify and discard spurious images during the training, and a Wiped module takes care of the final classification while broadcasting information on the prediction confidence at inference time. The goodness of our solution is demonstrated on a number of public benchmarks addressing different image classification tasks, as well as on a real-world case study on histological image analysis. Overall, our experiments demonstrate that W2WNet is able to identify image degradation and mislabelling issues both at training and at inference time, with a positive impact on the final classification accuracy.