CVApr 16, 2023

Autoencoders with Intrinsic Dimension Constraints for Learning Low Dimensional Image Representations

Jianzhang Zheng, Hao Shen, Jian Yang, Xuan Tang, Mingsong Chen, Hui Yu, Jielong Guo, Xian Wei

arXiv:2304.07686v12.81 citationsh-index: 16

Originality Incremental advance

AI Analysis

This work addresses the need for more discriminant low-dimensional image representations in computer vision, though it is incremental as it builds on existing autoencoder methods by adding ID regularization.

The paper tackled the problem of autoencoders ignoring the preservation of intrinsic dimension in image representations, and proposed a novel approach incorporating global and local ID constraints, which improved performance on downstream tasks like classification and clustering across benchmark datasets.

Autoencoders have achieved great success in various computer vision applications. The autoencoder learns appropriate low dimensional image representations through the self-supervised paradigm, i.e., reconstruction. Existing studies mainly focus on the minimizing the reconstruction error on pixel level of image, while ignoring the preservation of Intrinsic Dimension (ID), which is a fundamental geometric property of data representations in Deep Neural Networks (DNNs). Motivated by the important role of ID, in this paper, we propose a novel deep representation learning approach with autoencoder, which incorporates regularization of the global and local ID constraints into the reconstruction of data representations. This approach not only preserves the global manifold structure of the whole dataset, but also maintains the local manifold structure of the feature maps of each point, which makes the learned low-dimensional features more discriminant and improves the performance of the downstream algorithms. To our best knowledge, existing works are rare and limited on exploiting both global and local ID invariant properties on the regularization of autoencoders. Numerical experimental results on benchmark datasets (Extended Yale B, Caltech101 and ImageNet) show that the resulting regularized learning models achieve better discriminative representations for downstream tasks including image classification and clustering.

View on arXiv PDF

Similar