EfficientPose: Scalable single-person pose estimation
This work addresses the need for efficient pose estimation for real-world applications like sports and clinical analysis, but it is incremental as it builds on existing EfficientNet and mobile convolution techniques.
The paper tackles the problem of inefficient single-person pose estimation models by proposing EfficientPose, a novel architecture that uses EfficientNets and mobile inverted bottleneck convolutions to improve efficiency and scalability. The results show that EfficientPose outperforms OpenPose in accuracy and computational efficiency on the MPII benchmark, achieving state-of-the-art accuracy with low complexity.
Single-person human pose estimation facilitates markerless movement analysis in sports, as well as in clinical applications. Still, state-of-the-art models for human pose estimation generally do not meet the requirements of real-life applications. The proliferation of deep learning techniques has resulted in the development of many advanced approaches. However, with the progresses in the field, more complex and inefficient models have also been introduced, which have caused tremendous increases in computational demands. To cope with these complexity and inefficiency challenges, we propose a novel convolutional neural network architecture, called EfficientPose, which exploits recently proposed EfficientNets in order to deliver efficient and scalable single-person pose estimation. EfficientPose is a family of models harnessing an effective multi-scale feature extractor and computationally efficient detection blocks using mobile inverted bottleneck convolutions, while at the same time ensuring that the precision of the pose configurations is still improved. Due to its low complexity and efficiency, EfficientPose enables real-world applications on edge devices by limiting the memory footprint and computational cost. The results from our experiments, using the challenging MPII single-person benchmark, show that the proposed EfficientPose models substantially outperform the widely-used OpenPose model both in terms of accuracy and computational efficiency. In particular, our top-performing model achieves state-of-the-art accuracy on single-person MPII, with low-complexity ConvNets.