LGMLOct 9, 2018

The Outer Product Structure of Neural Network Derivatives

arXiv:1810.03798v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of efficiently leveraging higher-order derivatives in neural network training for researchers and practitioners, though it appears incremental as it builds on known derivative properties without presenting new empirical results.

The paper demonstrates that feedforward and recurrent neural networks have an outer product derivative structure, which enables the use of higher-order information without approximations or excessive memory, potentially offering insights into optimization geometry and new regularization approaches.

In this paper, we show that feedforward and recurrent neural networks exhibit an outer product derivative structure but that convolutional neural networks do not. This structure makes it possible to use higher-order information without needing approximations or infeasibly large amounts of memory, and it may also provide insights into the geometry of neural network optima. The ability to easily access these derivatives also suggests a new, geometric approach to regularization. We then discuss how this structure could be used to improve training methods, increase network robustness and generalizability, and inform network compression methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes