CVMar 11, 2024
3D-aware Image Generation and Editing with Multi-modal ConditionsBo Li, Yi-ke Li, Zhi-fen He et al.
3D-consistent image generation from a single 2D semantic label is an important and challenging research topic in computer graphics and computer vision. Although some related works have made great progress in this field, most of the existing methods suffer from poor disentanglement performance of shape and appearance, and lack multi-modal control. In this paper, we propose a novel end-to-end 3D-aware image generation and editing model incorporating multiple types of conditional inputs, including pure noise, text and reference image. On the one hand, we dive into the latent space of 3D Generative Adversarial Networks (GANs) and propose a novel disentanglement strategy to separate appearance features from shape features during the generation process. On the other hand, we propose a unified framework for flexible image generation and editing tasks with multi-modal conditions. Our method can generate diverse images with distinct noises, edit the attribute through a text description and conduct style transfer by giving a reference RGB image. Extensive experiments demonstrate that the proposed method outperforms alternative approaches both qualitatively and quantitatively on image generation and editing.
CVOct 11, 2025
Collaborative Learning of Semantic-Aware Feature Learning and Label Recovery for Multi-Label Image Recognition with Incomplete LabelsZhi-Fen He, Ren-Dong Xie, Bo Li et al.
Multi-label image recognition with incomplete labels is a critical learning task and has emerged as a focal topic in computer vision. However, this task is confronted with two core challenges: semantic-aware feature learning and missing label recovery. In this paper, we propose a novel Collaborative Learning of Semantic-aware feature learning and Label recovery (CLSL) method for multi-label image recognition with incomplete labels, which unifies the two aforementioned challenges into a unified learning framework. More specifically, we design a semantic-related feature learning module to learn robust semantic-related features by discovering semantic information and label correlations. Then, a semantic-guided feature enhancement module is proposed to generate high-quality discriminative semantic-aware features by effectively aligning visual and semantic feature spaces. Finally, we introduce a collaborative learning framework that integrates semantic-aware feature learning and label recovery, which can not only dynamically enhance the discriminability of semantic-aware features but also adaptively infer and recover missing labels, forming a mutually reinforced loop between the two processes. Extensive experiments on three widely used public datasets (MS-COCO, VOC2007, and NUS-WIDE) demonstrate that CLSL outperforms the state-of-the-art multi-label image recognition methods with incomplete labels.
CVJul 20, 2025
Semantic-Aware Representation Learning via Conditional Transport for Multi-Label Image ClassificationRen-Dong Xie, Zhi-Fen He, Bo Li et al.
Multi-label image classification is a critical task in machine learning that aims to accurately assign multiple labels to a single image. While existing methods often utilize attention mechanisms or graph convolutional networks to model visual representations, their performance is still constrained by two critical limitations: the inability to learn discriminative semantic-aware features, and the lack of fine-grained alignment between visual representations and label embeddings. To tackle these issues in a unified framework, this paper proposes a novel approach named Semantic-aware representation learning via Conditional Transport for Multi-Label Image Classification (SCT). The proposed method introduces a semantic-related feature learning module that extracts discriminative label-specific features by emphasizing semantic relevance and interaction, along with a conditional transport-based alignment mechanism that enables precise visual-semantic alignment. Extensive experiments on two widely-used benchmark datasets, VOC2007 and MS-COCO, validate the effectiveness of SCT and demonstrate its superior performance compared to existing state-of-the-art methods.