A Passive Similarity based CNN Filter Pruning for Efficient Acoustic Scene Classification
This work addresses deployment challenges for resource-constrained devices in acoustic scene classification, but it is incremental as it builds on existing pruning techniques.
The paper tackles the problem of high computational complexity in CNNs for acoustic scene classification by proposing a passive filter pruning method, which reduces computations by 27% and parameters by 25% with less than 1% accuracy drop.
We present a method to develop low-complexity convolutional neural networks (CNNs) for acoustic scene classification (ASC). The large size and high computational complexity of typical CNNs is a bottleneck for their deployment on resource-constrained devices. We propose a passive filter pruning framework, where a few convolutional filters from the CNNs are eliminated to yield compressed CNNs. Our hypothesis is that similar filters produce similar responses and give redundant information allowing such filters to be eliminated from the network. To identify similar filters, a cosine distance based greedy algorithm is proposed. A fine-tuning process is then performed to regain much of the performance lost due to filter elimination. To perform efficient fine-tuning, we analyze how the performance varies as the number of fine-tuning training examples changes. An experimental evaluation of the proposed framework is performed on the publicly available DCASE 2021 Task 1A baseline network trained for ASC. The proposed method is simple, reduces computations per inference by 27%, with 25% fewer parameters, with less than 1% drop in accuracy.