LGMLJun 14, 2019

A Distribution Dependent and Independent Complexity Analysis of Manifold Regularization

arXiv:1906.06100v2
AI Analysis

This provides theoretical insights for practitioners in machine learning, particularly in semi-supervised learning, but is incremental as it builds on existing manifold regularization frameworks.

The paper tackles the problem of analyzing the sample complexity of manifold regularization in semi-supervised learning, deriving distribution-dependent and independent bounds using pseudo-dimension and Rademacher complexity, and shows that semi-supervised methods can only achieve constant improvement over supervised ones in certain settings.

Manifold regularization is a commonly used technique in semi-supervised learning. It enforces the classification rule to be smooth with respect to the data-manifold. Here, we derive sample complexity bounds based on pseudo-dimension for models that add a convex data dependent regularization term to a supervised learning process, as is in particular done in Manifold regularization. We then compare the bound for those semi-supervised methods to purely supervised methods, and discuss a setting in which the semi-supervised method can only have a constant improvement, ignoring logarithmic terms. By viewing Manifold regularization as a kernel method we then derive Rademacher bounds which allow for a distribution dependent analysis. Finally we illustrate that these bounds may be useful for choosing an appropriate manifold regularization parameter in situations with very sparsely labeled data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes