MM AINov 18, 2025

Real-Time Mobile Video Analytics for Pre-arrival Emergency Medical Services

Liuyi Jin, Amran Haroon, Radu Stoleru, Pasan Gunawardena, Michael Middleton, Jeeeun Kim

arXiv:2511.14119v1

Originality Incremental advance

AI Analysis

This addresses the critical need for timely and accurate pre-arrival information in emergency medical services, representing a domain-specific advancement with potential to transform EMS operations.

The paper tackles the problem of limited analytics capabilities in emergency medical services by developing TeleEMS, a mobile live video analytics system that fuses audio and video for pre-arrival multimodal inference. Results show that their domain-specialized LLM (EMSLlama) outperforms GPT-4o with an exact-match score of 0.89 vs. 0.57, and text-vital fusion improves inference robustness for intervention recommendations.

Timely and accurate pre-arrival video streaming and analytics are critical for emergency medical services (EMS) to deliver life-saving interventions. Yet, current-generation EMS infrastructure remains constrained by one-to-one video streaming and limited analytics capabilities, leaving dispatchers and EMTs to manually interpret overwhelming, often noisy or redundant information in high-stress environments. We present TeleEMS, a mobile live video analytics system that enables pre-arrival multimodal inference by fusing audio and video into a unified decision-making pipeline before EMTs arrive on scene. TeleEMS comprises two key components: TeleEMS Client and TeleEMS Server. The TeleEMS Client runs across phones, smart glasses, and desktops to support bystanders, EMTs en route, and 911 dispatchers. The TeleEMS Server, deployed at the edge, integrates EMS-Stream, a communication backbone that enables smooth multi-party video streaming. On top of EMSStream, the server hosts three real-time analytics modules: (1) audio-to-symptom analytics via EMSLlama, a domain-specialized LLM for robust symptom extraction and normalization; (2) video-to-vital analytics using state-of-the-art rPPG methods for heart rate estimation; and (3) joint text-vital analytics via PreNet, a multimodal multitask model predicting EMS protocols, medication types, medication quantities, and procedures. Evaluation shows that EMSLlama outperforms GPT-4o (exact-match 0.89 vs. 0.57) and that text-vital fusion improves inference robustness, enabling reliable pre-arrival intervention recommendations. TeleEMS demonstrates the potential of mobile live video analytics to transform EMS operations, bridging the gap between bystanders, dispatchers, and EMTs, and paving the way for next-generation intelligent EMS infrastructure.

View on arXiv PDF

Similar