Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization
This work addresses the challenge of efficient neural architecture search and pruning for AI practitioners, offering an incremental improvement in automating layer-width optimization for tasks like image classification, segmentation, and translation.
The paper tackled the problem of automatically and optimally determining per-layer sparsity in neural network pruning by proposing Pruning-as-Search (PaS), an end-to-end channel pruning method that uses gradient descent to learn pruning policies, resulting in a new family of networks that achieved around 1.0% higher top-1 accuracy on ImageNet-1000 under similar inference speed compared to prior methods.
Neural architecture search (NAS) and network pruning are widely studied efficient AI techniques, but not yet perfect. NAS performs exhaustive candidate architecture search, incurring tremendous search cost. Though (structured) pruning can simply shrink model dimension, it remains unclear how to decide the per-layer sparsity automatically and optimally. In this work, we revisit the problem of layer-width optimization and propose Pruning-as-Search (PaS), an end-to-end channel pruning method to search out desired sub-network automatically and efficiently. Specifically, we add a depth-wise binary convolution to learn pruning policies directly through gradient descent. By combining the structural reparameterization and PaS, we successfully searched out a new family of VGG-like and lightweight networks, which enable the flexibility of arbitrary width with respect to each layer instead of each stage. Experimental results show that our proposed architecture outperforms prior arts by around $1.0\%$ top-1 accuracy under similar inference speed on ImageNet-1000 classification task. Furthermore, we demonstrate the effectiveness of our width search on complex tasks including instance segmentation and image translation. Code and models are released.