MLLGNov 27, 2020

Deep orthogonal linear networks are shallow

arXiv:2011.13831v1
AI Analysis

This work provides a theoretical insight into the training dynamics of a specific type of deep linear network, which is important for researchers studying the theoretical underpinnings of deep learning.

This paper demonstrates that training a deep orthogonal linear network using Riemannian gradient descent is mathematically equivalent to training a single-layer shallow network. This implies that overparametrization and implicit bias have no effect in this specific deep learning setting.

We consider the problem of training a deep orthogonal linear network, which consists of a product of orthogonal matrices, with no non-linearity in-between. We show that training the weights with Riemannian gradient descent is equivalent to training the whole factorization by gradient descent. This means that there is no effect of overparametrization and implicit bias at all in this setting: training such a deep, overparametrized, network is perfectly equivalent to training a one-layer shallow network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes