The Effective Depth Paradox: Evaluating the Relationship between Architectural Topology and Trainability in Deep CNNs
It addresses the challenge of designing scalable deep learning models for researchers and practitioners, though it is incremental by refining existing concepts rather than introducing a new paradigm.
This paper tackles the problem of understanding how architectural topology affects trainability in deep CNNs by introducing the concept of effective depth, which better predicts scaling potential and optimization stability than nominal depth, with empirical results showing that architectures like ResNet and GoogLeNet maintain stability while sequential ones like VGG suffer from gradient attenuation.
This paper investigates the relationship between convolutional neural network (CNN) and image recognition performance through a comparative study of the VGG, ResNet and GoogLeNet architectural families. By evaluating these models under a unified experimental framework on upscaled CIFAR-10 data, we isolate the effects of depth from confounding implementation variables. We introduce a formal distinction between nominal depth ($D_{\mathrm{nom}}$), the total count of weight-bearing layers, and effective depth ($D_{\mathrm{eff}}$), an operational metric representing the expected number of sequential transformations encountered along all feasible forward paths. As derived in Section 3, $D_{\mathrm{eff}}$ is computed through topology-specific proxies: as the total sequential count for plain networks, the arithmetic mean of minimum and maximum path lengths for residual structures, and the sum of average branch depths for multi-branch modules. Our empirical results demonstrate that while sequential architectures such as VGG suffer from diminishing returns and severe gradient attenuation as $D_{\mathrm{nom}}$ increases, architectures with identity shortcuts or branching modules maintain optimization stability. This stability is achieved by decoupling $D_{\mathrm{eff}}$ from $D_{\mathrm{nom}}$, thus ensuring a manageable functional depth for gradient propagation. We conclude that effective depth serves as a superior predictor of a network's scaling potential and practical trainability compared to traditional layer counts, providing a principled framework for future architectural innovation.