Defects of Convolutional Decoder Networks in Frequency Representation
This work identifies fundamental limitations in convolutional decoder networks for tasks requiring accurate frequency representation, which is incremental as it builds on known issues but provides formal proofs.
The paper proves that cascaded convolutional decoder networks have inherent defects in representing frequency components, including weakening high frequencies due to convolution and zero-padding, generating repetitive signals from upsampling, and failing to learn effectively when input and output frequencies are slightly shifted.
In this paper, we prove the representation defects of a cascaded convolutional decoder network, considering the capacity of representing different frequency components of an input sample. We conduct the discrete Fourier transform on each channel of the feature map in an intermediate layer of the decoder network. Then, we extend the 2D circular convolution theorem to represent the forward and backward propagations through convolutional layers in the frequency domain. Based on this, we prove three defects in representing feature spectrums. First, we prove that the convolution operation, the zero-padding operation, and a set of other settings all make a convolutional decoder network more likely to weaken high-frequency components. Second, we prove that the upsampling operation generates a feature spectrum, in which strong signals repetitively appear at certain frequencies. Third, we prove that if the frequency components in the input sample and frequency components in the target output for regression have a small shift, then the decoder usually cannot be effectively learned.