Convolution-Weight-Distribution Assumption: Rethinking the Criteria of Channel Pruning
This work addresses inefficiencies in pruning criteria for compressing CNNs, though it appears incremental by refining existing methods rather than introducing a new paradigm.
The paper identifies two blind spots in channel pruning criteria for CNNs—similarity among criteria leading to identical pruned structures and poor applicability in distinguishing redundancy—and proposes the Convolutional Weight Distribution Assumption, verified through statistical tests, to address these issues.
Channel pruning is a popular technique for compressing convolutional neural networks (CNNs), where various pruning criteria have been proposed to remove the redundant filters. From our comprehensive experiments, we found two blind spots in the study of pruning criteria: (1) Similarity: There are some strong similarities among several primary pruning criteria that are widely cited and compared. According to these criteria, the ranks of filters'Importance Score are almost identical, resulting in similar pruned structures. (2) Applicability: The filters'Importance Score measured by some pruning criteria are too close to distinguish the network redundancy well. In this paper, we analyze these two blind spots on different types of pruning criteria with layer-wise pruning or global pruning. The analyses are based on the empirical experiments and our assumption (Convolutional Weight Distribution Assumption) that the well-trained convolutional filters each layer approximately follow a Gaussian-alike distribution. This assumption has been verified through systematic and extensive statistical tests.