NELGMLOct 31, 2016

Tensor Switching Networks

arXiv:1610.10087v110 citations
Originality Highly original
AI Analysis

This addresses a fundamental issue in deep learning for researchers and practitioners, offering a novel architectural approach with potential broad impact.

The authors tackled the vanishing gradient problem in neural networks by introducing Tensor Switching (TS) networks, which generalize ReLU to tensor-valued units and allow linear readouts to implement expressive functions, resulting in faster learning and greater expressiveness compared to standard ReLU networks.

We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes