LGMLFeb 14, 2012

Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions

arXiv:1202.3758v1111 citations
AI Analysis

This addresses the challenge of handling distributional data in ML, which is incremental as it extends existing methods to a new data type.

The paper tackles the problem of performing machine learning tasks like embedding, clustering, and anomaly detection when instances are continuous probability distributions, by estimating distances between these distributions from i.i.d. samples and applying them to synthetic, image, and astronomical data.

Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution. These distributions are unknown, but we are given some i.i.d. samples from each distribution. Our goal is to estimate the distances between these distributions and use these distances to perform low-dimensional embedding, clustering/classification, or anomaly detection for the distributions. We present estimation algorithms, describe how to apply them for machine learning tasks on distributions, and show empirical results on synthetic data, real word images, and astronomical data sets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes