LG MLFeb 25, 2019

Analyzing Data Selection Techniques with Tools from the Theory of Information Losses

arXiv:1902.09602v42.72 citations

Originality Incremental advance

AI Analysis

This work provides theoretical insights for researchers in data selection and active learning, though it is incremental as it applies existing information theory frameworks to specific methods.

The paper tackles the problem of analyzing training data selection methods by introducing new tools based on information theoretic losses, and proves that Facility Location Selection and Transductive Experimental Design reduce these losses, with the latter gaining increased scope.

In this paper, we present and illustrate some new tools for rigorously analyzing training data selection methods. These tools focus on the information theoretic losses that occur when sampling data. We use this framework to prove that two methods, Facility Location Selection and Transductive Experimental Design, reduce these losses. These are meant to act as generalizable theoretical examples of applying the field of Information Theoretic Deep Learning Theory to the fields of data selection and active learning. Both analyses yield insight into their respective methods and increase their interpretability. In the case of Transductive Experimental Design, the provided analysis greatly increases the method's scope as well.

View on arXiv PDF

Similar