Semi-Automatic Data Annotation guided by Feature Space Projection
This work addresses the data annotation bottleneck for machine learning practitioners, offering an incremental improvement by combining human and machine abilities.
The paper tackles the laborious problem of data annotation by introducing a semi-automatic approach that uses feature space projection and semi-supervised learning to reduce user effort and improve classification accuracy, validated on MNIST and a challenging human intestinal parasite dataset with results showing increased effectiveness.
Data annotation using visual inspection (supervision) of each training sample can be laborious. Interactive solutions alleviate this by helping experts propagate labels from a few supervised samples to unlabeled ones based solely on the visual analysis of their feature space projection (with no further sample supervision). We present a semi-automatic data annotation approach based on suitable feature space projection and semi-supervised label estimation. We validate our method on the popular MNIST dataset and on images of human intestinal parasites with and without fecal impurities, a large and diverse dataset that makes classification very hard. We evaluate two approaches for semi-supervised learning from the latent and projection spaces, to choose the one that best reduces user annotation effort and also increases classification accuracy on unseen data. Our results demonstrate the added-value of visual analytics tools that combine complementary abilities of humans and machines for more effective machine learning.