IVAICVAug 10, 2023

Unleashing the Strengths of Unlabeled Data in Pan-cancer Abdominal Organ Quantification: the FLARE22 Challenge

arXiv:2308.05862v188 citationsh-index: 36
Originality Synthesis-oriented
AI Analysis

This work addresses the need for accurate and efficient AI tools in automated abdominal disease diagnosis, particularly in reducing annotation requirements for medical imaging, though it is incremental as it builds on existing AI methods applied to a new challenge and dataset.

The paper tackled the problem of automating abdominal organ quantification in medical imaging by organizing the FLARE22 Challenge, which benchmarked AI algorithms using a large multinational dataset; the best algorithms achieved a median Dice Similarity Coefficient of 90.0% with only 50 labeled scans and 2000 unlabeled scans, and generalized well to external validation sets with scores ranging from 88.3% to 90.9%.

Quantitative organ assessment is an essential step in automated abdominal disease diagnosis and treatment planning. Artificial intelligence (AI) has shown great potential to automatize this process. However, most existing AI algorithms rely on many expert annotations and lack a comprehensive evaluation of accuracy and efficiency in real-world multinational settings. To overcome these limitations, we organized the FLARE 2022 Challenge, the largest abdominal organ analysis challenge to date, to benchmark fast, low-resource, accurate, annotation-efficient, and generalized AI algorithms. We constructed an intercontinental and multinational dataset from more than 50 medical groups, including Computed Tomography (CT) scans with different races, diseases, phases, and manufacturers. We independently validated that a set of AI algorithms achieved a median Dice Similarity Coefficient (DSC) of 90.0\% by using 50 labeled scans and 2000 unlabeled scans, which can significantly reduce annotation requirements. The best-performing algorithms successfully generalized to holdout external validation sets, achieving a median DSC of 89.5\%, 90.9\%, and 88.3\% on North American, European, and Asian cohorts, respectively. They also enabled automatic extraction of key organ biology features, which was labor-intensive with traditional manual measurements. This opens the potential to use unlabeled data to boost performance and alleviate annotation shortages for modern AI models.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes