CVAIApr 26, 2025

VISUALCENT: Visual Human Analysis using Dynamic Centroid Representation

arXiv:2504.19032v12 citationsh-index: 11Has CodeFG
Originality Incremental advance
AI Analysis

This work addresses limitations in real-time human analysis for applications like surveillance or robotics, though it appears incremental as it builds on centroid-based methods.

The authors tackled the problem of generalizability and scalability in multi-person visual human analysis by introducing VISUALCENT, a unified framework for pose estimation and instance segmentation that achieved higher mAP scores and faster frame rates on COCO and OCHuman datasets.

We introduce VISUALCENT, a unified human pose and instance segmentation framework to address generalizability and scalability limitations to multi person visual human analysis. VISUALCENT leverages centroid based bottom up keypoint detection paradigm and uses Keypoint Heatmap incorporating Disk Representation and KeyCentroid to identify the optimal keypoint coordinates. For the unified segmentation task, an explicit keypoint is defined as a dynamic centroid called MaskCentroid to swiftly cluster pixels to specific human instance during rapid changes in human body movement or significantly occluded environment. Experimental results on COCO and OCHuman datasets demonstrate VISUALCENTs accuracy and real time performance advantages, outperforming existing methods in mAP scores and execution frame rate per second. The implementation is available on the project page.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes