Yanzhu Hu

CV
h-index7
3papers
17citations
Novelty52%
AI Score44

3 Papers

CVJul 22, 2022Code
Kinematics Modeling Network for Video-based Human Pose Estimation

Yonghao Dang, Jianqin Yin, Shaojie Zhang et al.

Estimating human poses from videos is critical in human-computer interaction. Joints cooperate rather than move independently during human movement. There are both spatial and temporal correlations between joints. Despite the positive results of previous approaches, most focus on modeling the spatial correlation between joints while only straightforwardly integrating features along the temporal dimension, ignoring the temporal correlation between joints. In this work, we propose a plug-and-play kinematics modeling module (KMM) to explicitly model temporal correlations between joints across different frames by calculating their temporal similarity. In this way, KMM can capture motion cues of the current joint relative to all joints in different time. Besides, we formulate video-based human pose estimation as a Markov Decision Process and design a novel kinematics modeling network (KIMNet) to simulate the Markov Chain, allowing KIMNet to locate joints recursively. Our approach achieves state-of-the-art results on two challenging benchmarks. In particular, KIMNet shows robustness to the occlusion. The code will be released at https://github.com/YHDang/KIMNet.

CVMar 25
MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies

Weixiang Shen, Yanzhu Hu, Che Liu et al.

Currently, evaluating vision-language models (VLMs) in medical imaging tasks oversimplifies clinical reality by relying on pre-selected 2D images that demand significant manual labor to curate. This setup misses the core challenge of realworld diagnostics: a true clinical agent must actively navigate full 3D volumes across multiple sequences or modalities to gather evidence and ultimately support a final decision. To address this, we propose MEDOPENCLAW, an auditable runtime designed to let VLMs operate dynamically within standard medical tools or viewers (e.g., 3D Slicer). On top of this runtime, we introduce MEDFLOWBENCH, a full-study medical imaging benchmark covering multi-sequence brain MRI and lung CT/PET. It systematically evaluates medical agentic capabilities across viewer-only, tool-use, and open-method tracks. Initial results reveal a critical insight: while state-of-the-art LLMs/VLMs (e.g., Gemini 3.1 Pro and GPT-5.4) can successfully navigate the viewer to solve basic study-level tasks, their performance paradoxically degrades when given access to professional support tools due to a lack of precise spatial grounding. By bridging the gap between static-image perception and interactive clinical workflows, MEDOPENCLAW and MEDFLOWBENCH establish a reproducible foundation for developing auditable, full-study medical imaging agents.

CVApr 22, 2024Code
DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation

Yonghao Dang, Jianqin Yin, Liyuan Liu et al.

Multi-person pose estimation (MPPE) presents a formidable yet crucial challenge in computer vision. Most existing methods predominantly concentrate on isolated interaction either between instances or joints, which is inadequate for scenarios demanding concurrent localization of both instances and joints. This paper introduces a novel CNN-based single-stage method, named Dual-path Hierarchical Relation Network (DHRNet), to extract instance-to-joint and joint-to-instance interactions concurrently. Specifically, we design a dual-path interaction modeling module (DIM) that strategically organizes cross-instance and cross-joint interaction modeling modules in two complementary orders, enriching interaction information by integrating merits from different correlation modeling branches. Notably, DHRNet excels in joint localization by leveraging information from other instances and joints. Extensive evaluations on challenging datasets, including COCO, CrowdPose, and OCHuman datasets, showcase DHRNet's state-of-the-art performance. The code will be released at https://github.com/YHDang/dhrnet-multi-pose-estimation.