CVLGApr 16

H2VLR: Heterogeneous Hypergraph Vision-Language Reasoning for Few-Shot Anomaly Detection

arXiv:2604.1450759.3h-index: 11
Predicted impact top 59% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For researchers in anomaly detection, this work addresses the limitation of pairwise feature matching in existing VLM-based few-shot methods by introducing high-order relational reasoning.

The paper proposes a Heterogeneous Hypergraph Vision-Language Reasoning (H2VLR) framework for few-shot anomaly detection, which models visual and semantic relations as a hypergraph to capture structural dependencies and global consistency, achieving state-of-the-art performance on industrial and medical benchmarks.

As a classic vision task, anomaly detection has been widely applied in industrial inspection and medical imaging. In this task, data scarcity is often a frequently-faced issue. To solve it, the few-shot anomaly detection (FSAD) scheme is attracting increasing attention. In recent years, beyond traditional visual paradigm, Vision-Language Model (VLM) has been extensively explored to boost this field. However, in currently-existing VLM-based FSAD schemes, almost all perform anomaly inference only by pairwise feature matching, ignoring structural dependencies and global consistency. To further redound to FSAD via VLM, we propose a Heterogeneous Hypergraph Vision-Language Reasoning (H2VLR) framework. It reformulates the FSAD as a high-order inference problem of visual-semantic relations, by jointly modeling visual regions and semantic concepts in a unified hypergraph. Experimental comparisons verify the effectiveness and advantages of H2VLR. It could often achieve state-of-the-art (SOTA) performance on representative industrial and medical benchmarks. Our code will be released upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes