Physic-HM: Restoring Physical Generative Logic in Multimodal Anomaly Detection via Hierarchical Modulation
This work addresses quality assurance in smart manufacturing by improving anomaly detection in complex processes like robotic welding, representing a novel method for a known bottleneck.
The paper tackles the problem of process-logic blindness and heterogeneity gaps in multimodal unsupervised anomaly detection for robotic welding by proposing Physic-HM, which incorporates physical inductive bias and hierarchical modulation, achieving a state-of-the-art I-AUROC of 90.7% on the Weld-4M benchmark.
Multimodal Unsupervised Anomaly Detection (UAD) is critical for quality assurance in smart manufacturing, particularly in complex processes like robotic welding. However, existing methods often suffer from process-logic blindness, treating process modalities (e.g., real-time video, audio, and sensors) and result modalities (e.g., post-weld images) as symmetric feature sources, thereby ignoring the inherent unidirectional physical generative logic. Furthermore, the heterogeneity gap between high-dimensional visual data and low-dimensional sensor signals frequently leads to critical process context being drowned out. In this paper, we propose Physic-HM, a multimodal UAD framework that explicitly incorporates physical inductive bias to model the process-to-result dependency. Specifically, our framework incorporates two key innovations: a Sensor-Guided PHM Modulation mechanism that utilizes low-dimensional sensor signals as context to guide high-dimensional audio-visual feature extraction, and a Physic-Hierarchical architecture that enforces a unidirectional generative mapping to identify anomalies that violate physical consistency. Extensive experiments on Weld-4M benchmark demonstrate that Physic-HM achieves a SOTA I-AUROC of 90.7%. The source code of Physic-HM will be released after the paper is accepted.