NAS-Count: Counting-by-Density with Neural Architecture Search
This work addresses the scale variation issue in crowd counting for computer vision applications, offering an automated alternative to manual design efforts.
The paper tackled the problem of automating crowd counting model design by using Neural Architecture Search (NAS) to create an encoder-decoder architecture called AMSNet, which achieved state-of-the-art results on four datasets, outperforming hand-designed models.
Most of the recent advances in crowd counting have evolved from hand-designed density estimation networks, where multi-scale features are leveraged to address the scale variation problem, but at the expense of demanding design efforts. In this work, we automate the design of counting models with Neural Architecture Search (NAS) and introduce an end-to-end searched encoder-decoder architecture, Automatic Multi-Scale Network (AMSNet). Specifically, we utilize a counting-specific two-level search space. The encoder and decoder in AMSNet are composed of different cells discovered from micro-level search, while the multi-path architecture is explored through macro-level search. To solve the pixel-level isolation issue in MSE loss, AMSNet is optimized with an auto-searched Scale Pyramid Pooling Loss (SPPLoss) that supervises the multi-scale structural information. Extensive experiments on four datasets show AMSNet produces state-of-the-art results that outperform hand-designed models, fully demonstrating the efficacy of NAS-Count.