The Oracle of DLphi
This addresses a foundational issue in machine learning by challenging the assumption that data quality matters, which could impact all of ML/AI if validated.
The paper tackles the problem of classification and prediction by proving that with a sufficiently large amount of labeled training data, the quality of the data becomes irrelevant, achieving exceptional results where test data labels are predicted almost always even if unrelated to training data.
We present a novel technique based on deep learning and set theory which yields exceptional classification and prediction results. Having access to a sufficiently large amount of labelled training data, our methodology is capable of predicting the labels of the test data almost always even if the training data is entirely unrelated to the test data. In other words, we prove in a specific setting that as long as one has access to enough data points, the quality of the data is irrelevant.