Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation
This addresses data quality issues in real-world scenarios, particularly in biomedical applications, but appears incremental as it builds on existing strategies like semi-supervised learning.
The paper tackles the problem of biased or ambiguous data in image classification by introducing a novel annotation strategy for generating high-quality labels, validated with over 250,000 annotations in a biomedical imaging case study.
In the field of image classification, existing methods often struggle with biased or ambiguous data, a prevalent issue in real-world scenarios. Current strategies, including semi-supervised learning and class blending, offer partial solutions but lack a definitive resolution. Addressing this gap, our paper introduces a novel strategy for generating high-quality labels in challenging datasets. Central to our approach is a clearly designed flowchart, based on a broad literature review, which enables the creation of reliable labels. We validate our methodology through a rigorous real-world test case in the biomedical field, specifically in deducing height reduction from vertebral imaging. Our empirical study, leveraging over 250,000 annotations, demonstrates the effectiveness of our strategies decisions compared to their alternatives.