Dynamic Flows on Curved Space Generated by Labeled Data
This addresses the challenge of limited labeled data for machine learning practitioners, though it appears incremental as it builds on existing gradient flow and manifold techniques.
The authors tackled the problem of labeled data scarcity by proposing a gradient flow method that generates new samples close to a target dataset using a source dataset, lifting both to a probability distribution space on a feature-Gaussian manifold and minimizing maximum mean discrepancy loss, with results showing improved classification accuracy in transfer learning settings.
The scarcity of labeled data is a long-standing challenge for many machine learning tasks. We propose our gradient flow method to leverage the existing dataset (i.e., source) to generate new samples that are close to the dataset of interest (i.e., target). We lift both datasets to the space of probability distributions on the feature-Gaussian manifold, and then develop a gradient flow method that minimizes the maximum mean discrepancy loss. To perform the gradient flow of distributions on the curved feature-Gaussian space, we unravel the Riemannian structure of the space and compute explicitly the Riemannian gradient of the loss function induced by the optimal transport metric. For practical applications, we also propose a discretized flow, and provide conditional results guaranteeing the global convergence of the flow to the optimum. We illustrate the results of our proposed gradient flow method on several real-world datasets and show our method can improve the accuracy of classification models in transfer learning settings.