A Distribution Dependent and Independent Complexity Analysis of Manifold Regularization
This provides theoretical insights for practitioners in machine learning, particularly in semi-supervised learning, but is incremental as it builds on existing manifold regularization frameworks.
The paper tackles the problem of analyzing the sample complexity of manifold regularization in semi-supervised learning, deriving distribution-dependent and independent bounds using pseudo-dimension and Rademacher complexity, and shows that semi-supervised methods can only achieve constant improvement over supervised ones in certain settings.
Manifold regularization is a commonly used technique in semi-supervised learning. It enforces the classification rule to be smooth with respect to the data-manifold. Here, we derive sample complexity bounds based on pseudo-dimension for models that add a convex data dependent regularization term to a supervised learning process, as is in particular done in Manifold regularization. We then compare the bound for those semi-supervised methods to purely supervised methods, and discuss a setting in which the semi-supervised method can only have a constant improvement, ignoring logarithmic terms. By viewing Manifold regularization as a kernel method we then derive Rademacher bounds which allow for a distribution dependent analysis. Finally we illustrate that these bounds may be useful for choosing an appropriate manifold regularization parameter in situations with very sparsely labeled data.