CVDec 29, 2025Code
ProGuard: Towards Proactive Multimodal SafeguardShaohan Yu, Lijun Li, Chenyang Si et al.
The rapid evolution of generative models has led to a continuous emergence of multimodal safety risks, exposing the limitations of existing defense methods. To address these challenges, we propose ProGuard, a vision-language proactive guard that identifies and describes out-of-distribution (OOD) safety risks without the need for model adjustments required by traditional reactive approaches. We first construct a modality-balanced dataset of 87K samples, each annotated with both binary safety labels and risk categories under a hierarchical multimodal safety taxonomy, effectively mitigating modality bias and ensuring consistent moderation across text, image, and text-image inputs. Based on this dataset, we train our vision-language base model purely through reinforcement learning (RL) to achieve efficient and concise reasoning. To approximate proactive safety scenarios in a controlled setting, we further introduce an OOD safety category inference task and augment the RL objective with a synonym-bank-based similarity reward that encourages the model to generate concise descriptions for unseen unsafe categories. Experimental results show that ProGuard achieves performance comparable to closed-source large models on binary safety classification, substantially outperforms existing open-source guard models on unsafe content categorization. Most notably, ProGuard delivers a strong proactive moderation ability, improving OOD risk detection by 52.6% and OOD risk description by 64.8%.
AINov 19, 2025
SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning ModelsXin Gao, Shaohan Yu, Zerui Chen et al.
Large Reasoning Models (LRMs) improve answer quality through explicit chain-of-thought, yet this very capability introduces new safety risks: harmful content can be subtly injected, surface gradually, or be justified by misleading rationales within the reasoning trace. Existing safety evaluations, however, primarily focus on output-level judgments and rarely capture these dynamic risks along the reasoning process. In this paper, we present SafeRBench, the first benchmark that assesses LRM safety end-to-end -- from inputs and intermediate reasoning to final outputs. (1) Input Characterization: We pioneer the incorporation of risk categories and levels into input design, explicitly accounting for affected groups and severity, and thereby establish a balanced prompt suite reflecting diverse harm gradients. (2) Fine-Grained Output Analysis: We introduce a micro-thought chunking mechanism to segment long reasoning traces into semantically coherent units, enabling fine-grained evaluation across ten safety dimensions. (3) Human Safety Alignment: We validate LLM-based evaluations against human annotations specifically designed to capture safety judgments. Evaluations on 19 LRMs demonstrate that SafeRBench enables detailed, multidimensional safety assessment, offering insights into risks and protective mechanisms from multiple perspectives.
LGNov 30, 2024
HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal ForecastingShaohan Yu, Pan Deng, Yu Zhao et al.
Achieving both accurate and consistent predictive performance across spatial nodes is crucial for ensuring the validity and reliability of outcomes in fair spatial-temporal forecasting tasks. However, existing training methods treat heterogeneous nodes with a fully averaged perspective, resulting in inherently biased prediction targets. Balancing accuracy and consistency is particularly challenging due to the multi-objective nature of spatial-temporal forecasting. To address this issue, we propose a novel Heterogeneity-Informed Mixture-of-Experts (HiMoE) framework that delivers both uniform and precise spatial-temporal predictions. From a model architecture perspective, we design the Heterogeneity-Informed Graph Convolutional Network (HiGCN) to address trend heterogeneity, and we introduce the Node-wise Mixture-of-Experts (NMoE) module to handle cardinality heterogeneity across nodes. From an evaluation perspective, we propose STFairBench, a benchmark that handles fairness in spatial-temporal prediction from both training and evaluation stages. Extensive experiments on four real-world datasets demonstrate that HiMoE achieves state-of-the-art performance, outperforming the best baseline by at least 9.22% across all evaluation metrics.