CVFeb 13, 2024

Latent space configuration for improved generalization in supervised autoencoder neural networks

arXiv:2402.08441v36 citationsh-index: 2

AI Analysis

This work addresses the need for more stable and interpretable training in supervised autoencoders, with applications in tasks like clothes texture classification and cross-dataset searches, though it is incremental in nature.

The paper tackles the problem of uncontrolled latent space properties in autoencoders by proposing two methods for configuring latent space topology, which improves generalization to unseen data and enables similarity estimation without decoders or classifiers. Results show effective generalization across datasets like LIP, Market1501, and WildTrack without fine-tuning.

Autoencoders (AE) are simple yet powerful class of neural networks that compress data by projecting input into low-dimensional latent space (LS). Whereas LS is formed according to the loss function minimization during training, its properties and topology are not controlled directly. In this paper we focus on AE LS properties and propose two methods for obtaining LS with desired topology, called LS configuration. The proposed methods include loss configuration using a geometric loss term that acts directly in LS, and encoder configuration. We show that the former allows to reliably obtain LS with desired configuration by defining the positions and shapes of LS clusters for supervised AE (SAE). Knowing LS configuration allows to define similarity measure in LS to predict labels or estimate similarity for multiple inputs without using decoders or classifiers. We also show that this leads to more stable and interpretable training. We show that SAE trained for clothes texture classification using the proposed method generalizes well to unseen data from LIP, Market1501, and WildTrack datasets without fine-tuning, and even allows to evaluate similarity for unseen classes. We further illustrate the advantages of pre-configured LS similarity estimation with cross-dataset searches and text-based search using a text query without language models.

View on arXiv PDF

Similar