CVLGApr 19, 2024

Towards Robust Ferrous Scrap Material Classification with Deep Learning and Conformal Prediction

arXiv:2404.13002v11 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of automating and building trust in scrap classification for the steel industry, though it is incremental as it applies existing methods to a specific domain.

The paper tackled the problem of classifying ferrous scrap materials in steel production by integrating conformal prediction with computer vision models to quantify uncertainty and improve reliability, achieving over 95% accuracy with the Swin Transformer model.

In the steel production domain, recycling ferrous scrap is essential for environmental and economic sustainability, as it reduces both energy consumption and greenhouse gas emissions. However, the classification of scrap materials poses a significant challenge, requiring advancements in automation technology. Additionally, building trust among human operators is a major obstacle. Traditional approaches often fail to quantify uncertainty and lack clarity in model decision-making, which complicates acceptance. In this article, we describe how conformal prediction can be employed to quantify uncertainty and add robustness in scrap classification. We have adapted the Split Conformal Prediction technique to seamlessly integrate with state-of-the-art computer vision models, such as the Vision Transformer (ViT), Swin Transformer, and ResNet-50, while also incorporating Explainable Artificial Intelligence (XAI) methods. We evaluate the approach using a comprehensive dataset of 8147 images spanning nine ferrous scrap classes. The application of the Split Conformal Prediction method allowed for the quantification of each model's uncertainties, which enhanced the understanding of predictions and increased the reliability of the results. Specifically, the Swin Transformer model demonstrated more reliable outcomes than the others, as evidenced by its smaller average size of prediction sets and achieving an average classification accuracy exceeding 95%. Furthermore, the Score-CAM method proved highly effective in clarifying visual features, significantly enhancing the explainability of the classification decisions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes