LG AI MLDec 8, 2022

A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces

Charline Le Lan, Joshua Greaves, Jesse Farebrother, Mark Rowland, Fabian Pedregosa, Rishabh Agarwal, Marc G. Bellemare

DeepMind

arXiv:2212.04025v110.48 citationsh-index: 43

Originality Incremental advance

AI Analysis

This addresses a bottleneck in scalable dimensionality reduction for large datasets, though it is incremental as it builds on existing methods like Oja's rule.

The paper tackles the problem of learning principal subspaces from sample entries of large matrices, which is useful for low-dimensional representations in applications like neuroscience and deep reinforcement learning, and presents an algorithm that scales to infinite rows and columns, with experiments on synthetic data, MNIST, and PuddleWorld showing its effectiveness.

Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of a given matrix from sample entries, i.e. from small random submatrices. Although a number of sample-based methods exist for this problem (e.g. Oja's rule \citep{oja1982simplified}), these assume access to full columns of the matrix or particular matrix structure such as symmetry and cannot be combined as-is with neural networks \citep{baldi1989neural}. In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace is represented by a neural network, and hence can be scaled to datasets with an effectively infinite number of rows and columns. Our method consists in defining a loss function whose minimizer is the desired principal subspace, and constructing a gradient estimate of this loss whose bias can be controlled. We complement our theoretical analysis with a series of experiments on synthetic matrices, the MNIST dataset \citep{lecun2010mnist} and the reinforcement learning domain PuddleWorld \citep{sutton1995generalization} demonstrating the usefulness of our approach.

View on arXiv PDF

Similar