CVDLFeb 5, 2024

[Citation needed] Data usage and citation practices in medical imaging conferences

arXiv:2402.03003v24 citationsh-index: 27Has CodeMIDL
AI Analysis

This addresses the difficulty in tracking dataset usage for researchers in medical imaging, but it is incremental as it focuses on tool development and analysis without proposing a new standard.

The study tackled the problem of tracking dataset usage in medical imaging papers by analyzing the citation and mention of 20 public datasets in MICCAI and MIDL conferences from 2013 to 2023, finding a concentration of usage on a limited set of datasets and highlighting varied citing practices that complicate automation.

Medical imaging papers often focus on methodology, but the quality of the algorithms and the validity of the conclusions are highly dependent on the datasets used. As creating datasets requires a lot of effort, researchers often use publicly available datasets, there is however no adopted standard for citing the datasets used in scientific papers, leading to difficulty in tracking dataset usage. In this work, we present two open-source tools we created that could help with the detection of dataset usage, a pipeline \url{https://github.com/TheoSourget/Public_Medical_Datasets_References} using OpenAlex and full-text analysis, and a PDF annotation software \url{https://github.com/TheoSourget/pdf_annotator} used in our study to manually label the presence of datasets. We applied both tools on a study of the usage of 20 publicly available medical datasets in papers from MICCAI and MIDL. We compute the proportion and the evolution between 2013 and 2023 of 3 types of presence in a paper: cited, mentioned in the full text, cited and mentioned. Our findings demonstrate the concentration of the usage of a limited set of datasets. We also highlight different citing practices, making the automation of tracking difficult.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes