CVSep 27, 2025Code
C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World DetectionSiheng Wang, Zhengdao Li, Yanshu Li et al.
Object detection has advanced significantly in the closed-set setting, but real-world deployment remains limited by two challenges: poor generalization to unseen categories and insufficient robustness under adverse conditions. Prior research has explored these issues separately: visible-infrared detection improves robustness but lacks generalization, while open-world detection leverages vision-language alignment strategy for category diversity but struggles under extreme environments. This trade-off leaves robustness and diversity difficult to achieve simultaneously. To mitigate these issues, we propose \textbf{C3-OWD}, a curriculum cross-modal contrastive learning framework that unifies both strengths. Stage~1 enhances robustness by pretraining with RGBT data, while Stage~2 improves generalization via vision-language alignment. To prevent catastrophic forgetting between two stages, we introduce an Exponential Moving Average (EMA) mechanism that theoretically guarantees preservation of pre-stage performance with bounded parameter lag and function consistency. Experiments on FLIR, OV-COCO, and OV-LVIS demonstrate the effectiveness of our approach: C3-OWD achieves $80.1$ AP$^{50}$ on FLIR, $48.6$ AP$^{50}_{\text{Novel}}$ on OV-COCO, and $35.7$ mAP$_r$ on OV-LVIS, establishing competitive performance across both robustness and diversity evaluations. Code available at: https://github.com/justin-herry/C3-OWD.git.
LGMay 8, 2025
QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware PrimitivesXuzhi Zhang, Shaohui Peng, Qirui Zhou et al.
Computation-intensive tensor operators constitute over 90\% of the computations in Large Language Models (LLMs) and Deep Neural Networks.Automatically and efficiently generating high-performance tensor operators with hardware primitives is crucial for diverse and ever-evolving hardware architectures like RISC-V, ARM, and GPUs, as manually optimized implementation takes at least months and lacks portability.LLMs excel at generating high-level language codes, but they struggle to fully comprehend hardware characteristics and produce high-performance tensor operators. We introduce a tensor-operator auto-generation framework with a one-line user prompt (QiMeng-TensorOp), which enables LLMs to automatically exploit hardware characteristics to generate tensor operators with hardware primitives, and tune parameters for optimal performance across diverse hardware. Experimental results on various hardware platforms, SOTA LLMs, and typical tensor operators demonstrate that QiMeng-TensorOp effectively unleashes the computing capability of various hardware platforms, and automatically generates tensor operators of superior performance. Compared with vanilla LLMs, QiMeng-TensorOp achieves up to $1291 \times$ performance improvement. Even compared with human experts, QiMeng-TensorOp could reach $251 \%$ of OpenBLAS on RISC-V CPUs, and $124 \%$ of cuBLAS on NVIDIA GPUs. Additionally, QiMeng-TensorOp also significantly reduces development costs by $200 \times$ compared with human experts.
LGApr 29, 2025
GLIP-OOD: Zero-Shot Graph OOD Detection with Graph Foundation ModelHaoyan Xu, Zhengtao Yao, Xuzhi Zhang et al.
Out-of-distribution (OOD) detection is critical for ensuring the safety and reliability of machine learning systems, particularly in dynamic and open-world environments. In the vision and text domains, zero-shot OOD detection - which requires no training on in-distribution (ID) data - has advanced significantly through the use of large-scale pretrained models, such as vision-language models (VLMs) and large language models (LLMs). However, zero-shot OOD detection in graph-structured data remains largely unexplored, primarily due to the challenges posed by complex relational structures and the absence of powerful, large-scale pretrained models for graphs. In this work, we take the first step toward enabling zero-shot graph OOD detection by leveraging a graph foundation model (GFM). Our experiments show that, when provided only with class label names for both ID and OOD categories, the GFM can effectively perform OOD detection - often surpassing existing "supervised" OOD detection methods that rely on extensive labeled node data. We further address the practical scenario in which OOD label names are not available in real-world settings by introducing GLIP-OOD, a framework that uses LLMs to generate semantically informative pseudo-OOD labels from unlabeled data. These generated OOD labels allow the GFM to better separate ID and OOD classes, facilitating more precise OOD detection - all without any labeled nodes (only ID label names). To our knowledge, this is the first approach to achieve node-level graph OOD detection in a fully zero-shot setting, and it attains performance comparable to state-of-the-art supervised methods on four benchmark text-attributed graph datasets.