CVJan 11, 2024

A Lightweight Feature Fusion Architecture For Resource-Constrained Crowd Counting

arXiv:2401.05968v14 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses efficiency challenges for deploying crowd-counting models in real-world applications, though it is incremental as it builds on existing backbones and fusion techniques.

The paper tackled the problem of deploying crowd-counting models on resource-constrained devices by introducing lightweight models with MobileNet and MobileViT backbones, achieving comparable results to state-of-the-art methods on datasets like ShanghaiTech-A while being the most computationally efficient.

Crowd counting finds direct applications in real-world situations, making computational efficiency and performance crucial. However, most of the previous methods rely on a heavy backbone and a complex downstream architecture that restricts the deployment. To address this challenge and enhance the versatility of crowd-counting models, we introduce two lightweight models. These models maintain the same downstream architecture while incorporating two distinct backbones: MobileNet and MobileViT. We leverage Adjacent Feature Fusion to extract diverse scale features from a Pre-Trained Model (PTM) and subsequently combine these features seamlessly. This approach empowers our models to achieve improved performance while maintaining a compact and efficient design. With the comparison of our proposed models with previously available state-of-the-art (SOTA) methods on ShanghaiTech-A ShanghaiTech-B and UCF-CC-50 dataset, it achieves comparable results while being the most computationally efficient model. Finally, we present a comparative study, an extensive ablation study, along with pruning to show the effectiveness of our models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes