CVMar 30, 2023

Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks

Weihua Chen, Xianzhe Xu, Jian Jia, Hao luo, Yaohua Wang, Fan Wang, Rong Jin, Xiuyu Sun

Stanford

arXiv:2303.17602v128.4165 citationsh-index: 21Has Code

Originality Highly original

AI Analysis

This work addresses the need for adaptable representations in human-centric AI applications, offering a novel approach that is not incremental but specific to this domain.

The paper tackles the problem of learning a general human representation from unlabeled images to benefit downstream human-centric visual tasks, introducing SOLIDER, a framework that uses prior knowledge to incorporate semantic information and allows control over the semantic-to-appearance ratio via a controller, achieving state-of-the-art performance on six tasks.

Human-centric visual tasks have attracted increasing research attention due to their widespread applications. In this paper, we aim to learn a general human representation from massive unlabeled human images which can benefit downstream human-centric tasks to the maximum extent. We call this method SOLIDER, a Semantic cOntrollable seLf-supervIseD lEaRning framework. Unlike the existing self-supervised learning methods, prior knowledge from human images is utilized in SOLIDER to build pseudo semantic labels and import more semantic information into the learned representation. Meanwhile, we note that different downstream tasks always require different ratios of semantic information and appearance information. For example, human parsing requires more semantic information, while person re-identification needs more appearance information for identification purpose. So a single learned representation cannot fit for all requirements. To solve this problem, SOLIDER introduces a conditional network with a semantic controller. After the model is trained, users can send values to the controller to produce representations with different ratios of semantic information, which can fit different needs of downstream tasks. Finally, SOLIDER is verified on six downstream human-centric visual tasks. It outperforms state of the arts and builds new baselines for these tasks. The code is released in https://github.com/tinyvision/SOLIDER.

View on arXiv PDF Code

Similar