Discovering Hidden Factors of Variation in Deep Networks
This addresses the challenge of extracting interpretable, non-classification factors in deep learning, which is incremental as it builds on existing autoencoder methods with new regularization.
The paper tackled the problem of learning hidden factors of variation beyond classification signals in deep networks, and demonstrated that augmenting autoencoders with a cross-covariance penalty (XCov) can disentangle factors like handwriting style and subject identity, as shown on datasets such as MNIST, TFD, and Multi-PIE.
Deep learning has enjoyed a great deal of success because of its ability to learn useful features for tasks such as classification. But there has been less exploration in learning the factors of variation apart from the classification signal. By augmenting autoencoders with simple regularization terms during training, we demonstrate that standard deep architectures can discover and explicitly represent factors of variation beyond those relevant for categorization. We introduce a cross-covariance penalty (XCov) as a method to disentangle factors like handwriting style for digits and subject identity in faces. We demonstrate this on the MNIST handwritten digit database, the Toronto Faces Database (TFD) and the Multi-PIE dataset by generating manipulated instances of the data. Furthermore, we demonstrate these deep networks can extrapolate `hidden' variation in the supervised signal.