CrowdRec: 3D Crowd Reconstruction from Single Color Images
This work addresses the specific problem of 3D crowd reconstruction for computer vision applications, but it is incremental as it builds upon existing single-person methods with added constraints.
The paper tackles the problem of reconstructing 3D crowds from single color images, which is challenging due to occlusions and depth ambiguity, by proposing a crowd-constrained optimization that improves a single-person mesh recovery method to achieve accurate body poses and shapes with reasonable absolute positions in crowded scenes.
This is a technical report for the GigaCrowd challenge. Reconstructing 3D crowds from monocular images is a challenging problem due to mutual occlusions, server depth ambiguity, and complex spatial distribution. Since no large-scale 3D crowd dataset can be used to train a robust model, the current multi-person mesh recovery methods can hardly achieve satisfactory performance in crowded scenes. In this paper, we exploit the crowd features and propose a crowd-constrained optimization to improve the common single-person method on crowd images. To avoid scale variations, we first detect human bounding-boxes and 2D poses from the original images with off-the-shelf detectors. Then, we train a single-person mesh recovery network using existing in-the-wild image datasets. To promote a more reasonable spatial distribution, we further propose a crowd constraint to refine the single-person network parameters. With the optimization, we can obtain accurate body poses and shapes with reasonable absolute positions from a large-scale crowd image using a single-person backbone. The code will be publicly available at~\url{https://github.com/boycehbz/CrowdRec}.