LGMLDec 23, 2019

The Labeling Distribution Matrix (LDM): A Tool for Estimating Machine Learning Algorithm Capacity

arXiv:1912.10597v22 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for tools to estimate algorithm capacity in supervised learning, though it appears incremental with preliminary results.

The paper tackles the problem of estimating machine learning algorithm capacity by introducing the Labeling Distribution Matrix (LDM) to measure memorization and set lower bounds on generalization, finding that while results are not conclusive, it provides potentially valuable insights into prediction behavior.

Algorithm performance in supervised learning is a combination of memorization, generalization, and luck. By estimating how much information an algorithm can memorize from a dataset, we can set a lower bound on the amount of performance due to other factors such as generalization and luck. With this goal in mind, we introduce the Labeling Distribution Matrix (LDM) as a tool for estimating the capacity of learning algorithms. The method attempts to characterize the diversity of possible outputs by an algorithm for different training datasets, using this to measure algorithm flexibility and responsiveness to data. We test the method on several supervised learning algorithms, and find that while the results are not conclusive, the LDM does allow us to gain potentially valuable insight into the prediction behavior of algorithms. We also introduce the Label Recorder as an additional tool for estimating algorithm capacity, with more promising initial results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes