LG MLAug 15, 2020

Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

Chengcheng Wan, Henry Hoffmann, Shan Lu, Michael Maire

arXiv:2008.06635v15.011 citations

Originality Highly original

AI Analysis

This work addresses the challenge of efficient training for anytime neural networks, which is incremental as it builds on existing branched and nested architectures with a new optimizer.

The paper tackles the problem of training neural networks that produce increasingly accurate outputs over time (anytime behavior) by proposing Orthogonalized SGD, a novel optimizer that dynamically re-balances task-specific gradients to prevent interference between early and later outputs. Experiments show that this approach significantly improves the generalization accuracy of anytime networks.

We propose a novel variant of SGD customized for training network architectures that support anytime behavior: such networks produce a series of increasingly accurate outputs over time. Efficient architectural designs for these networks focus on re-using internal state; subnetworks must produce representations relevant for both immediate prediction as well as refinement by subsequent network stages. We consider traditional branched networks as well as a new class of recursively nested networks. Our new optimizer, Orthogonalized SGD, dynamically re-balances task-specific gradients when training a multitask network. In the context of anytime architectures, this optimizer projects gradients from later outputs onto a parameter subspace that does not interfere with those from earlier outputs. Experiments demonstrate that training with Orthogonalized SGD significantly improves generalization accuracy of anytime networks.

View on arXiv PDF

Similar