Towards Understanding Gradient Approximation in Equality Constrained Deep Declarative Networks
This work addresses a computational bottleneck in training deep learning models with constraints, offering a practical solution for researchers and practitioners, but it is incremental as it builds on existing declarative network frameworks.
The paper investigates conditions under which gradient approximations in deep declarative networks with equality constraints can ignore constraint terms while still providing a descent direction for loss minimization, finding that this approximation is computationally efficient and works well in practice for linear equality and normalization constraints, though it may fail in some cases.
We explore conditions for when the gradient of a deep declarative node can be approximated by ignoring constraint terms and still result in a descent direction for the global loss function. This has important practical application when training deep learning models since the approximation is often computationally much more efficient than the true gradient calculation. We provide theoretical analysis for problems with linear equality constraints and normalization constraints, and show examples where the approximation works well in practice as well as some cautionary tales for when it fails.