Modular Block-diagonal Curvature Approximations for Feedforward Architectures
This work provides a modular method for curvature approximations in machine learning, which is incremental as it builds on and generalizes existing block-diagonal approaches.
The authors tackled the problem of computing block-diagonal approximations to curvature matrices (e.g., Hessian) in feedforward architectures by proposing a modular extension of backpropagation, which simplifies manual derivations and integrates easily into existing libraries.
We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.