Neural Networks with Cheap Differential Operators
This work addresses a computational bottleneck for researchers and practitioners in fields like physics and machine learning, though it is incremental as it builds on existing neural network and automatic differentiation methods.
The paper tackles the problem of efficiently computing higher-order differential operators for neural networks by proposing a restricted architecture that enables cheap dimension-wise derivatives. The result is demonstrated in applications like implicit ODE solvers and continuous normalizing flows, showing practical efficiency gains.
Gradients of neural networks can be computed efficiently for any architecture, but some applications require differential operators with higher time complexity. We describe a family of restricted neural network architectures that allow efficient computation of a family of differential operators involving dimension-wise derivatives, used in cases such as computing the divergence. Our proposed architecture has a Jacobian matrix composed of diagonal and hollow (non-diagonal) components. We can then modify the backward computation graph to extract dimension-wise derivatives efficiently with automatic differentiation. We demonstrate these cheap differential operators for solving root-finding subproblems in implicit ODE solvers, exact density evaluation for continuous normalizing flows, and evaluating the Fokker--Planck equation for training stochastic differential equation models.