CVMay 23, 2025Code
T2VUnlearning: A Concept Erasing Method for Text-to-Video Diffusion ModelsXiaoyu Ye, Songjie Cheng, Yongtao Wang et al.
Recent advances in text-to-video (T2V) diffusion models have significantly enhanced the quality of generated videos. However, their capability to produce explicit or harmful content introduces new challenges related to misuse and potential rights violations. To address this newly emerging threat, we propose unlearning-based concept erasing as a solution. First, we adopt negatively-guided velocity prediction fine-tuning and enhance it with prompt augmentation to ensure robustness against prompts refined by large language models (LLMs). Second, to achieve precise unlearning, we incorporate mask-based localization regularization and concept preservation regularization to preserve the model's ability to generate non-target concepts. Extensive experiments demonstrate that our method effectively erases a specific concept while preserving the model's generation capability for all other concepts, outperforming existing methods. We provide the unlearned models in \href{https://github.com/VDIGPKU/T2VUnlearning.git}{https://github.com/VDIGPKU/T2VUnlearning.git}.
CVFeb 11, 2024
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian SplattingXiaoyu Zhou, Xingjian Ran, Yajiao Xiong et al.
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an instance-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene. Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. The source codes and models will be available at gala3d.github.io.
CVAug 28, 2025
DrivingGaussian++: Towards Realistic Reconstruction and Editable Simulation for Surrounding Dynamic Driving ScenesYajiao Xiong, Xiaoyu Zhou, Yongtao Wan et al.
We present DrivingGaussian++, an efficient and effective framework for realistic reconstructing and controllable editing of surrounding dynamic autonomous driving scenes. DrivingGaussian++ models the static background using incremental 3D Gaussians and reconstructs moving objects with a composite dynamic Gaussian graph, ensuring accurate positions and occlusions. By integrating a LiDAR prior, it achieves detailed and consistent scene reconstruction, outperforming existing methods in dynamic scene reconstruction and photorealistic surround-view synthesis. DrivingGaussian++ supports training-free controllable editing for dynamic driving scenes, including texture modification, weather simulation, and object manipulation, leveraging multi-view images and depth priors. By integrating large language models (LLMs) and controllable editing, our method can automatically generate dynamic object motion trajectories and enhance their realism during the optimization process. DrivingGaussian++ demonstrates consistent and realistic editing results and generates dynamic multi-view driving scenarios, while significantly enhancing scene diversity. More results and code can be found at the project site: https://xiong-creator.github.io/DrivingGaussian_plus.github.io