LG IT MLFeb 7, 2024

Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth

Kevin Kögler, Alexander Shevchenko, Hamed Hassani, Marco Mondelli

arXiv:2402.05013v17.95 citationsh-index: 3ICML

Originality Incremental advance

AI Analysis

This work addresses a foundational gap in autoencoder theory for data compression, offering provable insights that could enhance efficiency in domains like image processing, though it is incremental in building on existing methods.

The paper tackles the theoretical understanding of autoencoders for compressing structured data, proving that shallow autoencoders fail to exploit sparsity in data like sparse Gaussians, but shows adding nonlinearities and depth provably improves compression performance, with validation on image datasets such as CIFAR-10 and MNIST.

Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow autoencoder capture the structure of the underlying data distribution? For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. Namely, the performance of the algorithm is the same as if it was compressing a Gaussian source - with no sparsity. For general data distributions, we give evidence of a phase transition phenomenon in the shape of the gradient descent minimizer, as a function of the data sparsity: below the critical sparsity level, the minimizer is a rotation taken uniformly at random (just like in the compression of non-sparse data); above the critical sparsity, the minimizer is the identity (up to a permutation). Finally, by exploiting a connection with approximate message passing algorithms, we show how to improve upon Gaussian performance for the compression of sparse data: adding a denoising function to a shallow architecture already reduces the loss provably, and a suitable multi-layer decoder leads to a further improvement. We validate our findings on image datasets, such as CIFAR-10 and MNIST.

View on arXiv PDF

Similar