LGJul 5, 2024
Continuous Sleep Depth Index Annotation with Deep Learning Yields Novel Digital Biomarkers for Sleep HealthSongchi Zhou, Ge Song, Haoqi Sun et al.
Traditional sleep staging categorizes sleep and wakefulness into five coarse-grained classes, overlooking subtle variations within each stage. It provides limited information about the duration of arousal and may hinder research on sleep fragmentation and relevant sleep disorders. To address this issue, we propose a deep learning method for automatic and scalable annotation of continuous sleep depth index (SDI) using existing discrete sleep staging labels. Our approach was validated using polysomnography from over 10,000 recordings across four large-scale cohorts. The results showcased a strong correlation between the decrease in sleep depth index and the increase in duration of arousal. Specific case studies indicated that the sleep depth index captured more nuanced sleep structures than conventional sleep staging. Gaussian mixture models based on the digital biomarkers extracted from the sleep depth index identified two subtypes of sleep, where participants in the disturbed sleep group had a higher prevalence of sleep apnea, insomnia, poor subjective sleep quality, hypertension, and cardiovascular disease. The disturbed subtype was associated with a 42% (hazard ratio 1.42, 95% CI 1.24-1.62) increased risk of mortality and a 29% (hazard ratio 1.29, 95% CI 1.00-1.67) increased risk of fatal cardiovascular disease. Our study underscores the utility of the proposed method for continuous sleep depth annotation, which could reveal more detailed information about the sleep structure and yield novel digital biomarkers for routine clinical use in sleep medicine.
73.5ROApr 3
Sim2Real-AD: A Modular Sim-to-Real Framework for Deploying VLM-Guided Reinforcement Learning in Real-World Autonomous DrivingZilin Huang, Zhengyang Wan, Zihao Sheng et al.
Deploying reinforcement learning policies trained in simulation to real autonomous vehicles remains a fundamental challenge, particularly for VLM-guided RL frameworks whose policies are typically learned with simulator-native observations and simulator-coupled action semantics that are unavailable on physical platforms. This paper presents Sim2Real-AD, a modular framework for zero-shot sim-to-real transfer of CARLA-trained VLM-guided RL policies to full-scale vehicles without any real-world RL training data. The framework decomposes the transfer problem into four components: a Geometric Observation Bridge (GOB) that converts monocular front-view images into simulator-compatible bird's-eye-view (BEV) observations, a Physics-Aware Action Mapping (PAM) that translates policy outputs into platform-agnostic physical commands, a Two-Phase Progressive Training (TPT) strategy that stabilizes adaptation by separating action-space and observation-space transfer, and a Real-time Deployment Pipeline (RDP) that integrates perception, policy inference, control conversion, and safety monitoring for closed-loop execution. Simulation experiments show that the framework preserves the relative performance ordering of representative RL algorithms across different reward paradigms and validate the contribution of each module. Zero-shot deployment on a full-scale Ford E-Transit achieves success rates of 90%, 80%, and 75% in car-following, obstacle avoidance, and stop-sign interaction scenarios, respectively. To the best of our knowledge, this study is among the first to demonstrate zero-shot closed-loop deployment of a CARLA-trained VLM-guided RL policy on a full-scale real vehicle without any real-world RL training data. The demo video and code are available at: https://zilin-huang.github.io/Sim2Real-AD-website/.
ROFeb 21, 2025
CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language ModelsZihao Sheng, Zilin Huang, Yansong Qu et al.
Ensuring safety in autonomous driving systems remains a critical challenge, particularly in handling rare but potentially catastrophic safety-critical scenarios. While existing research has explored generating safety-critical scenarios for autonomous vehicle (AV) testing, there is limited work on effectively incorporating these scenarios into policy learning to enhance safety. Furthermore, developing training curricula that adapt to an AV's evolving behavioral patterns and performance bottlenecks remains largely unexplored. To address these challenges, we propose CurricuVLM, a novel framework that leverages Vision-Language Models (VLMs) to enable personalized curriculum learning for autonomous driving agents. Our approach uniquely exploits VLMs' multimodal understanding capabilities to analyze agent behavior, identify performance weaknesses, and dynamically generate tailored training scenarios for curriculum adaptation. Through comprehensive analysis of unsafe driving situations with narrative descriptions, CurricuVLM performs in-depth reasoning to evaluate the AV's capabilities and identify critical behavioral patterns. The framework then synthesizes customized training scenarios targeting these identified limitations, enabling effective and personalized curriculum learning. Extensive experiments on the Waymo Open Motion Dataset show that CurricuVLM outperforms state-of-the-art baselines across both regular and safety-critical scenarios, achieving superior performance in terms of navigation success, driving efficiency, and safety metrics. Further analysis reveals that CurricuVLM serves as a general approach that can be integrated with various RL algorithms to enhance autonomous driving systems. The code and demo video are available at: https://zihaosheng.github.io/CurricuVLM/.
SPOct 30, 2024
SleepNetZero: Zero-Burden Zero-Shot Reliable Sleep Staging With Neural Networks Based on BallistocardiogramsShuzhen Li, Yuxin Chen, Xuesong Chen et al.
Sleep monitoring plays a crucial role in maintaining good health, with sleep staging serving as an essential metric in the monitoring process. Traditional methods, utilizing medical sensors like EEG and ECG, can be effective but often present challenges such as unnatural user experience, complex deployment, and high costs. Ballistocardiography~(BCG), a type of piezoelectric sensor signal, offers a non-invasive, user-friendly, and easily deployable alternative for long-term home monitoring. However, reliable BCG-based sleep staging is challenging due to the limited sleep monitoring data available for BCG. A restricted training dataset prevents the model from generalization across populations. Additionally, transferring to BCG faces difficulty ensuring model robustness when migrating from other data sources. To address these issues, we introduce SleepNetZero, a zero-shot learning based approach for sleep staging. To tackle the generalization challenge, we propose a series of BCG feature extraction methods that align BCG components with corresponding respiratory, cardiac, and movement channels in PSG. This allows models to be trained on large-scale PSG datasets that are diverse in population. For the migration challenge, we employ data augmentation techniques, significantly enhancing generalizability. We conducted extensive training and testing on large datasets~(12393 records from 9637 different subjects), achieving an accuracy of 0.803 and a Cohen's Kappa of 0.718. ZeroSleepNet was also deployed in real prototype~(monitoring pads) and tested in actual hospital settings~(265 users), demonstrating an accuracy of 0.697 and a Cohen's Kappa of 0.589. To the best of our knowledge, this work represents the first known reliable BCG-based sleep staging effort and marks a significant step towards in-home health monitoring.
CVAug 9, 2025
SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident UnderstandingZihao Sheng, Zilin Huang, Yansong Qu et al.
Multimodal large language models (MLLMs) have achieved remarkable progress across a range of vision-language tasks and demonstrate strong potential for traffic accident understanding. However, existing MLLMs in this domain primarily focus on coarse-grained image-level or video-level comprehension and often struggle to handle fine-grained visual details or localized scene components, limiting their applicability in complex accident scenarios. To address these limitations, we propose SafePLUG, a novel framework that empowers MLLMs with both Pixel-Level Understanding and temporal Grounding for comprehensive traffic accident analysis. SafePLUG supports both arbitrary-shaped visual prompts for region-aware question answering and pixel-level segmentation based on language instructions, while also enabling the recognition of temporally anchored events in traffic accident scenarios. To advance the development of MLLMs for traffic accident understanding, we curate a new dataset containing multimodal question-answer pairs centered on diverse accident scenarios, with detailed pixel-level annotations and temporal event boundaries. Experimental results show that SafePLUG achieves strong performance on multiple tasks, including region-based question answering, pixel-level segmentation, temporal event localization, and accident event understanding. These capabilities lay a foundation for fine-grained understanding of complex traffic scenes, with the potential to improve driving safety and enhance situational awareness in smart transportation systems. The code, dataset, and model checkpoints will be made publicly available at: https://zihaosheng.github.io/SafePLUG
ROApr 25, 2025
Sky-Drive: A Distributed Multi-Agent Simulation Platform for Human-AI Collaborative and Socially-Aware Future TransportationZilin Huang, Zihao Sheng, Zhengyang Wan et al.
Recent advances in autonomous system simulation platforms have significantly enhanced the safe and scalable testing of driving policies. However, existing simulators do not yet fully meet the needs of future transportation research-particularly in enabling effective human-AI collaboration and modeling socially-aware driving agents. This paper introduces Sky-Drive, a novel distributed multi-agent simulation platform that addresses these limitations through four key innovations: (a) a distributed architecture for synchronized simulation across multiple terminals; (b) a multi-modal human-in-the-loop framework integrating diverse sensors to collect rich behavioral data; (c) a human-AI collaboration mechanism supporting continuous and adaptive knowledge exchange; and (d) a digital twin framework for constructing high-fidelity virtual replicas of real-world transportation environments. Sky-Drive supports diverse applications such as autonomous vehicle-human road users interaction modeling, human-in-the-loop training, socially-aware reinforcement learning, personalized driving development, and customized scenario generation. Future extensions will incorporate foundation models for context-aware decision support and hardware-in-the-loop testing for real-world validation. By bridging scenario generation, data collection, algorithm training, and hardware integration, Sky-Drive has the potential to become a foundational platform for the next generation of human-centered and socially-aware autonomous transportation systems research. The demo video and code are available at:https://sky-lab-uw.github.io/Sky-Drive-website/
QMJan 23, 2021
Predicting the Mechanical Properties of Biopolymer Gels Using Neural Networks Trained on Discrete Fiber Network DataYue Leng, Vahidullah Tac, Sarah Calve et al.
Biopolymer gels, such as those made out of fibrin or collagen, are widely used in tissue engineering applications and biomedical research. Moreover, fibrin naturally assembles into gels in vivo during wound healing and thrombus formation. Macroscale biopolymer gel mechanics are dictated by the microscale fiber network. Hence, accurate description of biopolymer gels can be achieved using representative volume elements (RVE) that explicitly model the discrete fiber networks of the microscale. These RVE models, however, cannot be efficiently used to model the macroscale due to the challenges and computational demands of multiscale coupling. Here, we propose the use of an artificial, fully connected neural network (FCNN) to efficiently capture the behavior of the RVE models. The FCNN was trained on 1100 fiber networks subjected to 121 biaxial deformations. The stress data from the RVE, together with the total energy and the condition of incompressibility of the surrounding matrix, were used to determine the derivatives of an unknown strain energy function with respect to the deformation invariants. During training, the loss function was modified to ensure convexity of the strain energy function and symmetry of its Hessian. A general FCNN model was coded into a user material subroutine (UMAT) in the software Abaqus. In this work, the FCNN trained on the discrete fiber network data was used in finite element simulations of fibrin gels using our UMAT. We anticipate that this work will enable further integration of machine learning tools with computational mechanics. It will also improve computational modeling of biological materials characterized by a multiscale structure.