LG HC MLJul 31, 2018

Inferring the ground truth through crowdsourcing

arXiv:1807.11836v1

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of data labeling for whom in fields like healthcare, where ground truth is ambiguous, but it is incremental as it builds on existing crowdsourcing methods.

The paper tackles the problem of obtaining reliable ground truth for supervised learning in sensitive domains like medical imaging, where universally valid labels are costly or impossible, by proposing crowdsourcing to aggregate and verify annotations from multiple individuals or agents.

Universally valid ground truth is almost impossible to obtain or would come at a very high cost. For supervised learning without universally valid ground truth, a recommended approach is applying crowdsourcing: Gathering a large data set annotated by multiple individuals of varying possibly expertise levels and inferring the ground truth data to be used as labels to train the classifier. Nevertheless, due to the sensitivity of the problem at hand (e.g. mitosis detection in breast cancer histology images), the obtained data needs verification and proper assessment before being used for classifier training. Even in the context of organic computing systems, an indisputable ground truth might not always exist. Therefore, it should be inferred through the aggregation and verification of the local knowledge of each autonomous agent.

View on arXiv PDF

Similar