CVAug 22, 2024
Assessing the Uncertainty and Robustness of the Laptop Refurbishing SoftwareChengjie Lu, Jiahui Wu, Shaukat Ali et al.
Refurbishing laptops extends their lives while contributing to reducing electronic waste, which promotes building a sustainable future. To this end, the Danish Technological Institute (DTI) focuses on the research and development of several robotic applications empowered with software, including laptop refurbishing. Cleaning represents a major step in refurbishing and involves identifying and removing stickers from laptop surfaces. Software plays a crucial role in the cleaning process. For instance, the software integrates various object detection models to identify and remove stickers from laptops automatically. However, given the diversity in types of stickers (e.g., shapes, colors, locations), identification of the stickers is highly uncertain, thereby requiring explicit quantification of uncertainty associated with the identified stickers. Such uncertainty quantification can help reduce risks in removing stickers, which, for example, could otherwise result in software faults damaging laptop surfaces. For uncertainty quantification, we adopted the Monte Carlo Dropout method to evaluate six sticker detection models (SDMs) from DTI using three datasets: the original image dataset from DTI and two datasets generated with vision language models, i.e., DALL-E-3 and Stable Diffusion-3. In addition, we presented novel robustness metrics concerning detection accuracy and uncertainty to assess the robustness of the SDMs based on adversarial datasets generated from the three datasets using a dense adversary method. Our evaluation results show that different SDMs perform differently regarding different metrics. Based on the results, we provide SDM selection guidelines and lessons learned from various perspectives.
CRMar 26
Efficient ML-DSA Public Key Management Method with Identity for PKI and Its ApplicationPenghui Liu, Yi Niu, Xiaoxiong Zhong et al.
With the rapid evolution of the Industrial Internet of Things (IIoT), the boundaries and scale of the Internet are continuously expanding. Consequently, the limitations of traditional certificate-based Public Key Infrastructure (PKI) have become increasingly evident, particularly in scenarios requiring large-scale certificate storage, verification, and frequent transmission. These challenges are expected to be further amplified by the widespread adoption of post-quantum cryptography. In this paper, we propose a novel identity-based public key management framework for PKI based on post-quantum cryptography, termed \textit{IPK-pq}. This approach implements an identity key generation protocol leveraging NIST ML-DSA and random matrix theory. Building on the concept of the Composite Public Key (CPK), \textit{IPK-pq} addresses the linear collusion problem inherent in CPK through an enhanced identity mapping mechanism. Furthermore, it simplifies the verification of the declared public key's authenticity, effectively reducing the complexity associated with certificate-based key management. We also provide a formal security proof for \textit{IPK-pq}, covering both individual private key components and the composite private key. To validate our approach, formally, we directly implement and evaluate \textit{IPK-pq} within a typical PKI application scenario: Resource PKI (RPKI). Comparative experimental results demonstrate that an RPKI system based on \textit{IPK-pq} yields significant improvements in efficiency and scalability. These results validate the feasibility and rationality of \textit{IPK-pq}, positioning it as a strong candidate for next-generation RPKI systems capable of securely managing large-scale routing information.
CVJan 28
Say Cheese! Detail-Preserving Portrait Collection Generation via Natural Language EditsZelong Sun, Jiahui Wu, Ying Ba et al.
As social media platforms proliferate, users increasingly demand intuitive ways to create diverse, high-quality portrait collections. In this work, we introduce Portrait Collection Generation (PCG), a novel task that generates coherent portrait collections by editing a reference portrait image through natural language instructions. This task poses two unique challenges to existing methods: (1) complex multi-attribute modifications such as pose, spatial layout, and camera viewpoint; and (2) high-fidelity detail preservation including identity, clothing, and accessories. To address these challenges, we propose CHEESE, the first large-scale PCG dataset containing 24K portrait collections and 573K samples with high-quality modification text annotations, constructed through an Large Vison-Language Model-based pipeline with inversion-based verification. We further propose SCheese, a framework that combines text-guided generation with hierarchical identity and detail preservation. SCheese employs adaptive feature fusion mechanism to maintain identity consistency, and ConsistencyNet to inject fine-grained features for detail consistency. Comprehensive experiments validate the effectiveness of CHEESE in advancing PCG, with SCheese achieving state-of-the-art performance.
CVMar 26, 2024
Random-coupled Neural NetworkHaoran Liu, Mingzhe Liu, Peng Li et al.
Improving the efficiency of current neural networks and modeling them in biological neural systems have become popular research directions in recent years. Pulse-coupled neural network (PCNN) is a well applicated model for imitating the computation characteristics of the human brain in computer vision and neural network fields. However, differences between the PCNN and biological neural systems remain: limited neural connection, high computational cost, and lack of stochastic property. In this study, random-coupled neural network (RCNN) is proposed. It overcomes these difficulties in PCNN's neuromorphic computing via a random inactivation process. This process randomly closes some neural connections in the RCNN model, realized by the random inactivation weight matrix of link input. This releases the computational burden of PCNN, making it affordable to achieve vast neural connections. Furthermore, the image and video processing mechanisms of RCNN are researched. It encodes constant stimuli as periodic spike trains and periodic stimuli as chaotic spike trains, the same as biological neural information encoding characteristics. Finally, the RCNN is applicated to image segmentation, fusion, and pulse shape discrimination subtasks. It is demonstrated to be robust, efficient, and highly anti-noised, with outstanding performance in all applications mentioned above.
SEFeb 18, 2025
Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous VehiclesJiahui Wu, Chengjie Lu, Aitor Arrieta et al.
Autonomous vehicles (AVs) make driving decisions without human intervention. Therefore, ensuring AVs' dependability is critical. Despite significant research and development in AV development, their dependability assurance remains a significant challenge due to the complexity and unpredictability of their operating environments. Scenario-based testing evaluates AVs under various driving scenarios, but the unlimited number of potential scenarios highlights the importance of identifying critical scenarios that can violate safety or functional requirements. Such requirements are inherently interdependent and need to be tested simultaneously. To this end, we propose MOEQT, a novel multi-objective reinforcement learning (MORL)-based approach to generate critical scenarios that simultaneously test interdependent safety and functional requirements. MOEQT adapts Envelope Q-learning as the MORL algorithm, which dynamically adapts multi-objective weights to balance the relative importance between multiple objectives. MOEQT generates critical scenarios to violate multiple requirements through dynamically interacting with the AV environment, ensuring comprehensive AV testing. We evaluate MOEQT using an advanced end-to-end AV controller and a high-fidelity simulator and compare MOEQT with two baselines: a random strategy and a single-objective RL with a weighted reward function. Our evaluation results show that MOEQT achieved an overall better performance in identifying critical scenarios for violating multiple requirements than the baselines.
AIAug 7, 2025
StructVRM: Aligning Multimodal Reasoning with Structured and Verifiable Reward ModelsXiangxiang Zhang, Jingxuan Wei, Donghong Zhong et al.
Existing Vision-Language Models often struggle with complex, multi-question reasoning tasks where partial correctness is crucial for effective learning. Traditional reward mechanisms, which provide a single binary score for an entire response, are too coarse to guide models through intricate problems with multiple sub-parts. To address this, we introduce StructVRM, a method that aligns multimodal reasoning with Structured and Verifiable Reward Models. At its core is a model-based verifier trained to provide fine-grained, sub-question-level feedback, assessing semantic and mathematical equivalence rather than relying on rigid string matching. This allows for nuanced, partial credit scoring in previously intractable problem formats. Extensive experiments demonstrate the effectiveness of StructVRM. Our trained model, Seed-StructVRM, achieves state-of-the-art performance on six out of twelve public multimodal benchmarks and our newly curated, high-difficulty STEM-Bench. The success of StructVRM validates that training with structured, verifiable rewards is a highly effective approach for advancing the capabilities of multimodal models in complex, real-world reasoning domains.
LGJun 5, 2025
UniPTMs: The First Unified Multi-type PTM Site Prediction Model via Master-Slave Architecture-Based Multi-Stage Fusion Strategy and Hierarchical Contrastive LossYiyu Lin, Yan Wang, You Zhou et al.
As a core mechanism of epigenetic regulation in eukaryotes, protein post-translational modifications (PTMs) require precise prediction to decipher dynamic life activity networks. To address the limitations of existing deep learning models in cross-modal feature fusion, domain generalization, and architectural optimization, this study proposes UniPTMs: the first unified framework for multi-type PTM prediction. The framework innovatively establishes a "Master-Slave" dual-path collaborative architecture: The master path dynamically integrates high-dimensional representations of protein sequences, structures, and evolutionary information through a Bidirectional Gated Cross-Attention (BGCA) module, while the slave path optimizes feature discrepancies and recalibration between structural and traditional features using a Low-Dimensional Fusion Network (LDFN). Complemented by a Multi-scale Adaptive convolutional Pyramid (MACP) for capturing local feature patterns and a Bidirectional Hierarchical Gated Fusion Network (BHGFN) enabling multi-level feature integration across paths, the framework employs a Hierarchical Dynamic Weighting Fusion (HDWF) mechanism to intelligently aggregate multimodal features. Enhanced by a novel Hierarchical Contrastive loss function for feature consistency optimization, UniPTMs demonstrates significant performance improvements (3.2%-11.4% MCC and 4.2%-14.3% AP increases) over state-of-the-art models across five modification types and transcends the Single-Type Prediction Paradigm. To strike a balance between model complexity and performance, we have also developed a lightweight variant named UniPTMs-mini.
SDMar 14
Evaluating Semantic Fragility in Text-to-Audio Generation Systems Under Controlled Prompt PerturbationsJiahui Wu
Recent advances in text-to-audio generation enable models to translate natural-language descriptions into diverse musical output. However, the robustness of these systems under semantically equivalent prompt variations remains largely unexplored. Small linguistic changes may lead to substantial variation in generated audio, raising concerns about reliability in practical use. In this study, we evaluate the semantic fragility of text-to-audio systems under controlled prompt perturbations. We selected MusicGen-small, MusicGen-large, and Stable Audio 2.5 as representative models, and we evaluated them under Minimal Lexical Substitution (MLS), Intensity Shifts (IS), and Structural Rephrasing (SR). The proposed dataset contains 75 prompt groups designed to preserve semantic intent while introducing localized linguistic variation. Generated outputs are compared through complementary spectral, temporal, and semantic similarity measures, enabling robustness analysis across multiple representational levels. Experimental results show that larger models achieve improved semantic consistency, with MusicGen-large reaching cosine similarities of 0.77 under MLS and 0.82 under IS. However, acoustic and temporal analyses reveal persistent divergence across all models, even when embedding similarity remains high. These findings indicate that fragility arises primarily during semantic-to-acoustic realization rather than multi-modal embedding alignment. Our study introduces a controlled framework for evaluating robustness in text-to-audio generation and highlights the need for multi-level stability assessment in generative audio systems.
AISep 6, 2025
MSRFormer: Road Network Representation Learning using Multi-scale Feature Fusion of Heterogeneous Spatial InteractionsJian Yang, Jiahui Wu, Li Fang et al.
Transforming road network data into vector representations using deep learning has proven effective for road network analysis. However, urban road networks' heterogeneous and hierarchical nature poses challenges for accurate representation learning. Graph neural networks, which aggregate features from neighboring nodes, often struggle due to their homogeneity assumption and focus on a single structural scale. To address these issues, this paper presents MSRFormer, a novel road network representation learning framework that integrates multi-scale spatial interactions by addressing their flow heterogeneity and long-distance dependencies. It uses spatial flow convolution to extract small-scale features from large trajectory datasets, and identifies scale-dependent spatial interaction regions to capture the spatial structure of road networks and flow heterogeneity. By employing a graph transformer, MSRFormer effectively captures complex spatial dependencies across multiple scales. The spatial interaction features are fused using residual connections, which are fed to a contrastive learning algorithm to derive the final road network representation. Validation on two real-world datasets demonstrates that MSRFormer outperforms baseline methods in two road network analysis tasks. The performance gains of MSRFormer suggest the traffic-related task benefits more from incorporating trajectory data, also resulting in greater improvements in complex road network structures with up to 16% improvements compared to the most competitive baseline method. This research provides a practical framework for developing task-agnostic road network representation models and highlights distinct association patterns of the interplay between scale effects and flow heterogeneity of spatial interactions.
LGJun 10, 2024
Network-Based Transfer Learning Helps Improve Short-Term Crime Prediction AccuracyJiahui Wu, Vanessa Frias-Martinez
Deep learning architectures enhanced with human mobility data have been shown to improve the accuracy of short-term crime prediction models trained with historical crime data. However, human mobility data may be scarce in some regions, negatively impacting the correct training of these models. To address this issue, we propose a novel transfer learning framework for short-term crime prediction models, whereby weights from the deep learning crime prediction models trained in source regions with plenty of mobility data are transferred to target regions to fine-tune their local crime prediction models and improve crime prediction accuracy. Our results show that the proposed transfer learning framework improves the F1 scores for target cities with mobility data scarcity, especially when the number of months of available mobility data is small. We also show that the F1 score improvements are pervasive across different types of crimes and diverse cities in the US.
CYJun 6, 2024
Improving the Fairness of Deep-Learning, Short-term Crime Prediction with Under-reporting-aware ModelsJiahui Wu, Vanessa Frias-Martinez
Deep learning crime predictive tools use past crime data and additional behavioral datasets to forecast future crimes. Nevertheless, these tools have been shown to suffer from unfair predictions across minority racial and ethnic groups. Current approaches to address this unfairness generally propose either pre-processing methods that mitigate the bias in the training datasets by applying corrections to crime counts based on domain knowledge or in-processing methods that are implemented as fairness regularizers to optimize for both accuracy and fairness. In this paper, we propose a novel deep learning architecture that combines the power of these two approaches to increase prediction fairness. Our results show that the proposed model improves the fairness of crime predictions when compared to models with in-processing de-biasing approaches and with models without any type of bias correction, albeit at the cost of reducing accuracy.