CVOct 29, 2018

Unsupervised Data Selection for Supervised Learning

Gabriele Valvano, Andrea Leo, Daniele Della Latta, Nicola Martini, Gianmarco Santini, Dante Chiappino, Emiliano Ricciardi

arXiv:1810.12142v21.71 citations

Originality Synthesis-oriented

AI Analysis

This addresses the need for better data selection methods in machine learning, but it is incremental as it builds on existing supervised learning frameworks.

The paper tackles the problem of data collection for supervised learning by proposing unsupervised data selection to improve model generalization, but preliminary results are not robust and require further study.

Recent research put a big effort in the development of deep learning architectures and optimizers obtaining impressive results in areas ranging from vision to language processing. However little attention has been addressed to the need of a methodological process of data collection. In this work we hypothesize that high quality data for supervised learning can be selected in an unsupervised manner and that by doing so one can obtain models capable to generalize better than in the case of random training set construction. However, preliminary results are not robust and further studies on the subject should be carried out.

View on arXiv PDF

Similar