CV LGSep 17, 2020

Learning a Deep Part-based Representation by Preserving Data Distribution

Anyong Qin, Zhaowei Shang, Zhuolin Tan, Taiping Zhang, Yuan Yan Tang

arXiv:2009.08246v11.2

Originality Incremental advance

AI Analysis

This work addresses high-dimensional data recognition problems, offering an incremental improvement over existing methods for preserving intrinsic data structures.

The paper tackles unsupervised dimensionality reduction by proposing a deep autoencoder network that preserves the data distribution to learn a part-based representation, achieving improved cluster accuracy and AMI on real-world datasets.

Unsupervised dimensionality reduction is one of the commonly used techniques in the field of high dimensional data recognition problems. The deep autoencoder network which constrains the weights to be non-negative, can learn a low dimensional part-based representation of data. On the other hand, the inherent structure of the each data cluster can be described by the distribution of the intraclass samples. Then one hopes to learn a new low dimensional representation which can preserve the intrinsic structure embedded in the original high dimensional data space perfectly. In this paper, by preserving the data distribution, a deep part-based representation can be learned, and the novel algorithm is called Distribution Preserving Network Embedding (DPNE). In DPNE, we first need to estimate the distribution of the original high dimensional data using the $k$-nearest neighbor kernel density estimation, and then we seek a part-based representation which respects the above distribution. The experimental results on the real-world data sets show that the proposed algorithm has good performance in terms of cluster accuracy and AMI. It turns out that the manifold structure in the raw data can be well preserved in the low dimensional feature space.

View on arXiv PDF

Similar