CVDec 4, 2020

How Many Annotators Do We Need? -- A Study on the Influence of Inter-Observer Variability on the Reliability of Automatic Mitotic Figure Assessment

Frauke Wilm, Christof A. Bertram, Christian Marzahl, Alexander Bartel, Taryn A. Donovan, Charles-Antoine Assenmacher, Kathrin Becker, Mark Bennett, Sarah Corner, Brieuc Cossic, Daniela Denk, Martina Dettwiler

arXiv:2012.02495v22.31 citations

Originality Incremental advance

AI Analysis

This research addresses the problem of improving the reliability of deep learning algorithms for tumor prognostication in pathology by optimizing the number of expert annotators needed for database development.

This study investigated the impact of inter-observer variability on the reliability of automatic mitotic figure assessment. It found that using a consensus of three annotators significantly improved algorithmic F1 scores and consistency compared to individual annotator databases, with further additions yielding only minor improvements.

Density of mitotic figures in histologic sections is a prognostically relevant characteristic for many tumours. Due to high inter-pathologist variability, deep learning-based algorithms are a promising solution to improve tumour prognostication. Pathologists are the gold standard for database development, however, labelling errors may hamper development of accurate algorithms. In the present work we evaluated the benefit of multi-expert consensus (n = 3, 5, 7, 9, 11) on algorithmic performance. While training with individual databases resulted in highly variable F$_1$ scores, performance was notably increased and more consistent when using the consensus of three annotators. Adding more annotators only resulted in minor improvements. We conclude that databases by few pathologists and high label accuracy may be the best compromise between high algorithmic performance and time investment.

View on arXiv PDF

Similar