IVCVSep 20, 2024

Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation?

arXiv:2409.13548v41 citationsh-index: 16Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses lesion segmentation accuracy in medical imaging for PET/CT data, but it is incremental as it builds on existing data-centric methods.

The study tackled the problem of false positives in lesion segmentation from PET/CT data by trimming the training dataset, specifically removing easy samples based on model loss. The result was a reduction in false negative volume and an improvement in dice score on the test set.

In this work, we describe our approach to compete in the autoPET3 datacentric track. While conventional wisdom suggests that larger datasets lead to better model performance, recent studies indicate that excluding certain training samples can enhance model accuracy. We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics by producing a large number of false positives particularly for PSMA-PETs. We counteract this by removing the easiest samples from the training dataset as measured by the model loss before retraining from scratch. Using the proposed approach we manage to drive down the false negative volume and improve upon the baseline model in both false negative volume and dice score on the preliminary test set. Code and pre-trained models are available at github.com/alexanderjaus/autopet3_datadiet.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes