Deyu Zeng

CV
h-index8
3papers
4citations
Novelty48%
AI Score42

3 Papers

CVMar 10
ForgeDreamer: Industrial Text-to-3D Generation with Multi-Expert LoRA and Cross-View Hypergraph

Junhao Cai, Deyu Zeng, Junhao Pang et al.

Current text-to-3D generation methods excel in natural scenes but struggle with industrial applications due to two critical limitations: domain adaptation challenges where conventional LoRA fusion causes knowledge interference across categories, and geometric reasoning deficiencies where pairwise consistency constraints fail to capture higher-order structural dependencies essential for precision manufacturing. We propose a novel framework named ForgeDreamer addressing both challenges through two key innovations. First, we introduce a Multi-Expert LoRA Ensemble mechanism that consolidates multiple category-specific LoRA models into a unified representation, achieving superior cross-category generalization while eliminating knowledge interference. Second, building on enhanced semantic understanding, we develop a Cross-View Hypergraph Geometric Enhancement approach that captures structural dependencies spanning multiple viewpoints simultaneously. These components work synergistically improved semantic understanding, enables more effective geometric reasoning, while hypergraph modeling ensures manufacturing-level consistency. Extensive experiments on a custom industrial dataset demonstrate superior semantic generalization and enhanced geometric fidelity compared to state-of-the-art approaches. Our code and data are provided in the supplementary material attached in the appendix for review purposes.

CVOct 29, 2024Code
A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Image Anomaly Detection

Yuxuan Lin, Yang Chang, Xuan Tong et al.

In the advancement of industrial informatization, unsupervised anomaly detection technology effectively overcomes the scarcity of abnormal samples and significantly enhances the automation and reliability of smart manufacturing. As an important branch, industrial image anomaly detection focuses on automatically identifying visual anomalies in industrial scenarios (such as product surface defects, assembly errors, and equipment appearance anomalies) through computer vision techniques. With the rapid development of Unsupervised industrial Image Anomaly Detection (UIAD), excellent detection performance has been achieved not only in RGB setting but also in 3D and multimodal (RGB and 3D) settings. However, existing surveys primarily focus on UIAD tasks in RGB setting, with little discussion in 3D and multimodal settings. To address this gap, this artical provides a comprehensive review of UIAD tasks in the three modal settings. Specifically, we first introduce the task concept and process of UIAD. We then overview the research on UIAD in three modal settings (RGB, 3D, and multimodal), including datasets and methods, and review multimodal feature fusion strategies in multimodal setting. Finally, we summarize the main challenges faced by UIAD tasks in the three modal settings, and offer insights into future development directions, aiming to provide researchers with a comprehensive reference and offer new perspectives for the advancement of industrial informatization. Corresponding resources are available at https://github.com/Sunny5250/Awesome-Multi-Setting-UIAD.

CVFeb 10, 2025Code
Multimodal Task Representation Memory Bank vs. Catastrophic Forgetting in Anomaly Detection

You Zhou, Jiangshan Zhao, Deyu Zeng et al.

Unsupervised Continuous Anomaly Detection (UCAD) faces significant challenges in multi-task representation learning, with existing methods suffering from incomplete representation and catastrophic forgetting. Unlike supervised models, unsupervised scenarios lack prior information, making it difficult to effectively distinguish redundant and complementary multimodal features. To address this, we propose the Multimodal Task Representation Memory Bank (MTRMB) method through two key technical innovations: A Key-Prompt-Multimodal Knowledge (KPMK) mechanism that uses concise key prompts to guide cross-modal feature interaction between BERT and ViT. Refined Structure-based Contrastive Learning (RSCL) leveraging Grounding DINO and SAM to generate precise segmentation masks, pulling features of the same structural region closer while pushing different structural regions apart. Experiments on MVtec AD and VisA datasets demonstrate MTRMB's superiority, achieving an average detection accuracy of 0.921 at the lowest forgetting rate, significantly outperforming state-of-the-art methods. We plan to open source on GitHub.