CVNov 28, 2021

Automated Detection of Patients in Hospital Video Recordings

Siddharth Sharma, Florian Dubost, Christopher Lee-Messer, Daniel Rubin

arXiv:2111.14270v11.4

Originality Synthesis-oriented

AI Analysis

This addresses the need for real-time patient tracking in clinical settings to supplement oversight and reduce resource-intensive monitoring by healthcare staff, but it is incremental as it applies an existing method to new data.

The paper tackled the problem of automatically detecting patients in hospital video recordings for epilepsy monitoring, showing that fine-tuning a pre-trained Mask R-CNN on a curated dataset of 45 videos improved mean average precision to 0.64.

In a clinical setting, epilepsy patients are monitored via video electroencephalogram (EEG) tests. A video EEG records what the patient experiences on videotape while an EEG device records their brainwaves. Currently, there are no existing automated methods for tracking the patient's location during a seizure, and video recordings of hospital patients are substantially different from publicly available video benchmark datasets. For example, the camera angle can be unusual, and patients can be partially covered with bedding sheets and electrode sets. Being able to track a patient in real-time with video EEG would be a promising innovation towards improving the quality of healthcare. Specifically, an automated patient detection system could supplement clinical oversight and reduce the resource-intensive efforts of nurses and doctors who need to continuously monitor patients. We evaluate an ImageNet pre-trained Mask R-CNN, a standard deep learning model for object detection, on the task of patient detection using our own curated dataset of 45 videos of hospital patients. The dataset was aggregated and curated for this work. We show that without fine-tuning, ImageNet pre-trained Mask R-CNN models perform poorly on such data. By fine-tuning the models with a subset of our dataset, we observe a substantial improvement in patient detection performance, with a mean average precision of 0.64. We show that the results vary substantially depending on the video clip.

View on arXiv PDF

Similar