Diffusion Representations
This work provides a method for data analysis that improves upon existing diffusion maps by incorporating density information, but it is incremental as it builds on prior measure-based kernel approaches.
The paper tackles the problem of data analysis by proposing a representation framework based on a closed-form decomposition of a measure-based diffusion kernel, which preserves pairwise diffusion distances independently of data size and is invariant to scale, with no out-of-sample extension needed for stationary data.
Diffusion Maps framework is a kernel based method for manifold learning and data analysis that defines diffusion similarities by imposing a Markovian process on the given dataset. Analysis by this process uncovers the intrinsic geometric structures in the data. Recently, it was suggested to replace the standard kernel by a measure-based kernel that incorporates information about the density of the data. Thus, the manifold assumption is replaced by a more general measure-based assumption. The measure-based diffusion kernel incorporates two separate independent representations. The first determines a measure that correlates with a density that represents normal behaviors and patterns in the data. The second consists of the analyzed multidimensional data points. In this paper, we present a representation framework for data analysis of datasets that is based on a closed-form decomposition of the measure-based kernel. The proposed representation preserves pairwise diffusion distances that does not depend on the data size while being invariant to scale. For a stationary data, no out-of-sample extension is needed for embedding newly arrived data points in the representation space. Several aspects of the presented methodology are demonstrated on analytically generated data.