Take it in your stride: Do we need striding in CNNs?
This provides a foundational mathematical perspective for simplifying theoretical analysis of CNNs, though it is incremental in clarifying an existing operator.
The paper tackles the unclear mathematical role of striding in CNNs by demonstrating theoretically that a striding CNN can be represented as an equivalent non-striding CNN with more filters and smaller size, characterizing striding as a parameter-sharing mechanism to reduce training complexity.
Since their inception, CNNs have utilized some type of striding operator to reduce the overlap of receptive fields and spatial dimensions. Although having clear heuristic motivations (i.e. lowering the number of parameters to learn) the mathematical role of striding within CNN learning remains unclear. This paper offers a novel and mathematical rigorous perspective on the role of the striding operator within modern CNNs. Specifically, we demonstrate theoretically that one can always represent a CNN that incorporates striding with an equivalent non-striding CNN which has more filters and smaller size. Through this equivalence we are then able to characterize striding as an additional mechanism for parameter sharing among channels, thus reducing training complexity. Finally, the framework presented in this paper offers a new mathematical perspective on the role of striding which we hope shall facilitate and simplify the future theoretical analysis of CNNs.