CVJan 15

Fine-Grained Human Pose Editing Assessment via Layer-Selective MLLMs

arXiv:2601.10369v2h-index: 50
Originality Incremental advance
AI Analysis

This work addresses fine-grained evaluation for human pose editing in AIGC applications, offering a specialized benchmark and method to improve assessment, but it is incremental as it builds on existing MLLM techniques.

The paper tackles the problem of evaluating text-guided human pose editing, which suffers from structural anomalies and artifacts, by introducing HPE-Bench, a benchmark with 1,700 samples from 17 models, and a layer-selective MLLM framework that achieves superior performance in authenticity detection and quality regression.

Text-guided human pose editing has gained significant traction in AIGC applications. However,it remains plagued by structural anomalies and generative artifacts. Existing evaluation metrics often isolate authenticity detection from quality assessment, failing to provide fine-grained insights into pose-specific inconsistencies. To address these limitations, we introduce HPE-Bench, a specialized benchmark comprising 1,700 standardized samples from 17 state-of-the-art editing models, offering both authenticity labels and multi-dimensional quality scores. Furthermore, we propose a unified framework based on layer-selective multimodal large language models (MLLMs). By employing contrastive LoRA tuning and a novel layer sensitivity analysis (LSA) mechanism, we identify the optimal feature layer for pose evaluation. Our framework achieves superior performance in both authenticity detection and multi-dimensional quality regression, effectively bridging the gap between forensic detection and quality assessment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes