Identity Connections in Residual Nets Improve Noise Stability
This addresses the degradation problem in deep learning for computer vision, providing insights into network generalization, though it is incremental as it builds on existing ResNet frameworks.
The paper tackled the problem of why Residual Neural Networks (ResNets) outperform plain networks without residual connections (PlnNets) by showing they are equivalent in expressive power but ResNets have better noise stability, empirically supporting this with simplified and full networks.
Residual Neural Networks (ResNets) achieve state-of-the-art performance in many computer vision problems. Compared to plain networks without residual connections (PlnNets), ResNets train faster, generalize better, and suffer less from the so-called degradation problem. We introduce simplified (but still nonlinear) versions of ResNets and PlnNets for which these discrepancies still hold, although to a lesser degree. We establish a 1-1 mapping between simplified ResNets and simplified PlnNets, and show that they are exactly equivalent to each other in expressive power for the same computational complexity. We conjecture that ResNets generalize better because they have better noise stability, and empirically support it for both simplified and fully-fledged networks.