Redundancy in Deep Linear Neural Networks
This work offers incremental insights into the optimization properties of linear neural networks, potentially informing constraints in more complex architectures like convolutional and non-linear networks.
The paper challenges conventional wisdom by showing that training deep linear fully-connected networks with conventional optimizers is convex, similar to a single linear layer, providing a new conceptual understanding of linear networks.
Conventional wisdom states that deep linear neural networks benefit from expressiveness and optimization advantages over a single linear layer. This paper suggests that, in practice, the training process of deep linear fully-connected networks using conventional optimizers is convex in the same manner as a single linear fully-connected layer. This paper aims to explain this claim and demonstrate it. Even though convolutional networks are not aligned with this description, this work aims to attain a new conceptual understanding of fully-connected linear networks that might shed light on the possible constraints of convolutional settings and non-linear architectures.