Quantifying the effect of representations on task complexity
This work addresses the problem of optimizing data representations for improved learning efficiency, which is relevant for machine learning practitioners, but it is incremental as it builds on existing complexity measures and empirical validation.
The paper investigates how input data representations affect learning complexity by proposing that better representations align the model's implicit noise distribution with the true data distribution, making tasks easier. It quantifies this effect using a task complexity score and shows empirically that statistics from linear regression can predict learning performance across different representations and neural network types.
We examine the influence of input data representations on learning complexity. For learning, we posit that each model implicitly uses a candidate model distribution for unexplained variations in the data, its noise model. If the model distribution is not well aligned to the true distribution, then even relevant variations will be treated as noise. Crucially however, the alignment of model and true distribution can be changed, albeit implicitly, by changing data representations. "Better" representations can better align the model to the true distribution, making it easier to approximate the input-output relationship in the data without discarding useful data variations. To quantify this alignment effect of data representations on the difficulty of a learning task, we make use of an existing task complexity score and show its connection to the representation-dependent information coding length of the input. Empirically we extract the necessary statistics from a linear regression approximation and show that these are sufficient to predict relative learning performance outcomes of different data representations and neural network types obtained when utilizing an extensive neural network architecture search. We conclude that to ensure better learning outcomes, representations may need to be tailored to both task and model to align with the implicit distribution of model and task.