LGJan 10, 2014

Clustering, Coding, and the Concept of Similarity

arXiv:1401.2411v2

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of similarity measurement in data analysis for researchers in machine learning, though it appears incremental as it combines existing models in a principled way.

The paper tackles the problem of clustering and coding by developing a theory that integrates a geometric model (Riemannian manifold) with a probabilistic model to define a dissimilarity metric based on data density, resulting in a low-dimensional encoding of the data.

This paper develops a theory of clustering and coding which combines a geometric model with a probabilistic model in a principled way. The geometric model is a Riemannian manifold with a Riemannian metric, ${g}_{ij}({\bf x})$, which we interpret as a measure of dissimilarity. The probabilistic model consists of a stochastic process with an invariant probability measure which matches the density of the sample input data. The link between the two models is a potential function, $U({\bf x})$, and its gradient, $\nabla U({\bf x})$. We use the gradient to define the dissimilarity metric, which guarantees that our measure of dissimilarity will depend on the probability measure. Finally, we use the dissimilarity metric to define a coordinate system on the embedded Riemannian manifold, which gives us a low-dimensional encoding of our original data.

View on arXiv PDF

Similar