CVMar 9, 2019

Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

arXiv:1903.03777v2172 citations
Originality Incremental advance
AI Analysis

This work addresses the deployment challenge of balancing inference speed and accuracy for deep neural networks in real-world scenarios, offering a practical solution for embedded and GPU platforms.

The paper tackles the problem of optimizing neural networks for both speed and accuracy on specific hardware platforms, proposing Partial Order Pruning to automatically search architectures that achieve state-of-the-art trade-offs, such as DF-Seg networks for real-time segmentation.

Achieving good speed and accuracy trade-off on a target platform is very important in deploying deep neural networks in real world scenarios. However, most existing automatic architecture search approaches only concentrate on high performance. In this work, we propose an algorithm that can offer better speed/accuracy trade-off of searched networks, which is termed "Partial Order Pruning". It prunes the architecture search space with a partial order assumption to automatically search for the architectures with the best speed and accuracy trade-off. Our algorithm explicitly takes profile information about the inference speed on the target platform into consideration. With the proposed algorithm, we present several Dongfeng (DF) networks that provide high accuracy and fast inference speed on various application GPU platforms. By further searching decoder architectures, our DF-Seg real-time segmentation networks yield state-of-the-art speed/accuracy trade-off on both the target embedded device and the high-end GPU.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes