LG CVMar 16, 2022

Gradient Correction beyond Gradient Descent

Zefan Li, Bingbing Ni, Teng Li, WenJun Zhang, Wen Gao

arXiv:2203.08345v21.8h-index: 60

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in neural network optimization for researchers and practitioners, offering an incremental improvement over existing gradient-descent methods.

The paper tackles the problem of gradient quality degradation in neural network training by introducing a gradient correction framework (GCGD) that reduces training epochs by approximately 20% and improves network performance.

The great success neural networks have achieved is inseparable from the application of gradient-descent (GD) algorithms. Based on GD, many variant algorithms have emerged to improve the GD optimization process. The gradient for back-propagation is apparently the most crucial aspect for the training of a neural network. The quality of the calculated gradient can be affected by multiple aspects, e.g., noisy data, calculation error, algorithm limitation, and so on. To reveal gradient information beyond gradient descent, we introduce a framework (\textbf{GCGD}) to perform gradient correction. GCGD consists of two plug-in modules: 1) inspired by the idea of gradient prediction, we propose a \textbf{GC-W} module for weight gradient correction; 2) based on Neural ODE, we propose a \textbf{GC-ODE} module for hidden states gradient correction. Experiment results show that our gradient correction framework can effectively improve the gradient quality to reduce training epochs by $\sim$ 20\% and also improve the network performance.

View on arXiv PDF

Similar