Bo Ma

9.6CVOct 16, 2024

TV-3DG: Mastering Text-to-3D Customized Generation with Visual Prompt

Jiahui Yang, Donglin Di, Baorui Ma et al.

In recent years, advancements in generative models have significantly expanded the capabilities of text-to-3D generation. Many approaches rely on Score Distillation Sampling (SDS) technology. However, SDS struggles to accommodate multi-condition inputs, such as text and visual prompts, in customized generation tasks. To explore the core reasons, we decompose SDS into a difference term and a classifier-free guidance term. Our analysis identifies the core issue as arising from the difference term and the random noise addition during the optimization process, both contributing to deviations from the target mode during distillation. To address this, we propose a novel algorithm, Classifier Score Matching (CSM), which removes the difference term in SDS and uses a deterministic noise addition process to reduce noise during optimization, effectively overcoming the low-quality limitations of SDS in our customized generation framework. Based on CSM, we integrate visual prompt information with an attention fusion mechanism and sampling guidance techniques, forming the Visual Prompt CSM (VPCSM) algorithm. Furthermore, we introduce a Semantic-Geometry Calibration (SGC) module to enhance quality through improved textual information integration. We present our approach as TV-3DG, with extensive experiments demonstrating its capability to achieve stable, high-quality, customized 3D generation. Project page: \url{https://yjhboy.github.io/TV-3DG}

4.7CVFeb 6, 2021

Two-Step Image Dehazing with Intra-domain and Inter-domain Adaptation

Xin Yi, Bo Ma, Yulin Zhang et al.

Caused by the difference of data distributions, intra-domain gap and inter-domain gap are widely present in image processing tasks. In the field of image dehazing, certain previous works have paid attention to the inter-domain gap between the synthetic domain and the real domain. However, those methods only establish the connection from the source domain to the target domain without taking into account the large distribution shift within the target domain (intra-domain gap). In this work, we propose a Two-Step Dehazing Network (TSDN) with an intra-domain adaptation and a constrained inter-domain adaptation. First, we subdivide the distributions within the synthetic domain into subsets and mine the optimal subset (easy samples) by loss-based supervision. To alleviate the intra-domain gap of the synthetic domain, we propose an intra-domain adaptation to align distributions of other subsets to the optimal subset by adversarial learning. Finally, we conduct the constrained inter-domain adaptation from the real domain to the optimal subset of the synthetic domain, alleviating the domain shift between domains as well as the distribution shift within the real domain. Extensive experimental results demonstrate that our framework performs favorably against the state-of-the-art algorithms both on the synthetic datasets and the real datasets.

Bo Ma

2 Papers