QM LG MLSep 29, 2019

Natural representation of composite data with replicated autoencoders

Matteo Negri, Davide Bergamini, Carlo Baldassi, Riccardo Zecchina, Christoph Feinauer

arXiv:1909.13327v12.31 citations

Originality Incremental advance

AI Analysis

This method addresses the challenge of robust feature inference in composite data for fields like biology, though it appears incremental as it builds on existing autoencoder frameworks.

The authors tackled the problem of inferring basic features from composite data, such as biological sequences, by introducing an unsupervised method based on autoencoders optimized with local entropy, which enhanced performance considerably on synthetic and protein data.

Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the `local entropy' rather than the standard loss, resulting in a more robust inference, and enhancing the performance on this type of data considerably. Algorithmically, this is realized by training an interacting system of replicated autoencoders. We apply this method to synthetic and protein sequence data, and show that it is able to infer a hidden representation that correlates well with the underlying generative process, without requiring any prior knowledge.

View on arXiv PDF

Similar