Pruning Distorted Images in MNIST Handwritten Digits
This work addresses the challenge of noisy data in digit recognition for machine learning practitioners, but it is incremental as it builds on existing methods for data cleaning.
The paper tackled the problem of reduced accuracy in handwritten digit recognition due to distorted images in the MNIST dataset by proposing a two-stage deep learning approach to filter out distorted digits, achieving an accuracy rate of over 99.5% on the testing dataset.
Recognizing handwritten digits is a challenging task primarily due to the diversity of writing styles and the presence of noisy images. The widely used MNIST dataset, which is commonly employed as a benchmark for this task, includes distorted digits with irregular shapes, incomplete strokes, and varying skew in both the training and testing datasets. Consequently, these factors contribute to reduced accuracy in digit recognition. To overcome this challenge, we propose a two-stage deep learning approach. In the first stage, we create a simple neural network to identify distorted digits within the training set. This model serves to detect and filter out such distorted and ambiguous images. In the second stage, we exclude these identified images from the training dataset and proceed to retrain the model using the filtered dataset. This process aims to improve the classification accuracy and confidence levels while mitigating issues of underfitting and overfitting. Our experimental results demonstrate the effectiveness of the proposed approach, achieving an accuracy rate of over 99.5% on the testing dataset. This significant improvement showcases the potential of our method in enhancing digit classification accuracy. In our future work, we intend to explore the scalability of this approach and investigate techniques to further enhance accuracy by reducing the size of the training data.