CVJul 12, 2022
Domain Gap Estimation for Source Free Unsupervised Domain Adaptation with Many ClassifiersZiyang Zong, Jun He, Lei Zhang et al.
In theory, the success of unsupervised domain adaptation (UDA) largely relies on domain gap estimation. However, for source free UDA, the source domain data can not be accessed during adaptation, which poses great challenge of measuring the domain gap. In this paper, we propose to use many classifiers to learn the source domain decision boundaries, which provides a tighter upper bound of the domain gap, even if both of the domain data can not be simultaneously accessed. The source model is trained to push away each pair of classifiers whilst ensuring the correctness of the decision boundaries. In this sense, our many classifiers model separates the source different categories as far as possible which induces the maximum disagreement of many classifiers in the target domain, thus the transferable source domain knowledge is maximized. For adaptation, the source model is adapted to maximize the agreement among pairs of the classifiers. Thus the target features are pushed away from the decision boundaries. Experiments on several datasets of UDA show that our approach achieves state of the art performance among source free UDA approaches and can even compete to source available UDA methods.
CVFeb 21, 2025
Deflickering Vision-Based Occupancy Networks through Lightweight Spatio-Temporal CorrelationFengcheng Yu, Haoran Xu, Canming Xia et al.
Vision-based occupancy networks (VONs) provide an end-to-end solution for reconstructing 3D environments in autonomous driving. However, existing methods often suffer from temporal inconsistencies, manifesting as flickering effects that compromise visual experience and adversely affect decision-making. While recent approaches have incorporated historical data to mitigate the issue, they often incur high computational costs and may introduce noisy information that interferes with object detection. We propose OccLinker, a novel plugin framework designed to seamlessly integrate with existing VONs for boosting performance. Our method efficiently consolidates historical static and motion cues, learns sparse latent correlations with current features through a dual cross-attention mechanism, and produces correction occupancy components to refine the base network's predictions. We propose a new temporal consistency metric to quantitatively identify flickering effects. Extensive experiments on two benchmark datasets demonstrate that our method delivers superior performance with negligible computational overhead, while effectively eliminating flickering artifacts.
CVNov 19, 2024
HouseTune: Two-Stage Floorplan Generation with LLM AssistanceZiyang Zong, Guanying Chen, Zhaohuan Zhan et al.
This paper proposes a two-stage text-to-floorplan generation framework that combines the reasoning capability of Large Language Models (LLMs) with the generative power of diffusion models. In the first stage, we leverage a Chain-of-Thought (CoT) prompting strategy to guide an LLM in generating an initial layout (Layout-Init) from natural language descriptions, which ensures a user-friendly and intuitive design process. However, Layout-Init may lack precise geometric alignment and fine-grained structural details. To address this, the second stage employs a conditional diffusion model to refine Layout-Init into a final floorplan (Layout-Final) that better adheres to physical constraints and user requirements. Unlike prior methods, our approach effectively reduces the difficulty of floorplan generation learning without the need for extensive domain-specific training data. Experimental results demonstrate that our approach achieves state-of-the-art performance across all metrics, which validates its effectiveness in practical home design applications.