Lionel Tarassenko

h-index76

8papers

22,973citations

Novelty57%

AI Score39

Ranked #80,412 of 194,257 authors (top 41%)#27,170 in CV (top 46%)

8 Papers

3.9CVJun 6, 2023

Deep Learning-Enabled Sleep Staging From Vital Signs and Activity Measured Using a Near-Infrared Video Camera

Jonathan Carter, João Jorge, Bindia Venugopal et al.

Conventional sleep monitoring is time-consuming, expensive and uncomfortable, requiring a large number of contact sensors to be attached to the patient. Video data is commonly recorded as part of a sleep laboratory assessment. If accurate sleep staging could be achieved solely from video, this would overcome many of the problems of traditional methods. In this work we use heart rate, breathing rate and activity measures, all derived from a near-infrared video camera, to perform sleep stage classification. We use a deep transfer learning approach to overcome data scarcity, by using an existing contact-sensor dataset to learn effective representations from the heart and breathing rate time series. Using a dataset of 50 healthy volunteers, we achieve an accuracy of 73.4\% and a Cohen's kappa of 0.61 in four-class sleep stage classification, establishing a new state-of-the-art for video-based sleep staging.

3.1LGJul 6

Video-based detection of cessation of breathing in pre-term infants using machine learning

Dineo Serame, Lionel Tarassenko, Mauricio Villarroel

Pre-term infants are susceptible to potentially harmful apnoea-related cessations of breathing due to immature respiratory control. However, reliable respiratory monitoring in the neonatal intensive care unit (NICU) remains challenging because motion artefacts, sensor displacement, and skin fragility can compromise contact-based measurements. Non-contact video monitoring offers a complementary approach that does not depend on adhesive sensors while providing additional respiratory information. We investigated whether camera-based signals can detect apnoea-related cessation of breathing (COBE) and provide complementary information to routinely acquired physiological signals. Using video and clinical recordings from 30 pre-term infants, respiratory motion was extracted from dynamically tracked torso regions to generate camera-derived time-series signals. Camera-only models were trained using residual network (ResNet) architectures, while hybrid models combined video-derived signals with impedance pneumography (IP), ECG-derived respiration (EDR), and the PPG-derived respiratory envelope. Camera-only models achieved a balanced accuracy of 76.9%, demonstrating the feasibility of non-contact COBE detection. Combining video-derived features with IP improved balanced accuracy to 90.6%, outperforming either modality alone and indicating that video provides respiratory information beyond standard physiological signals. These findings show that video-derived signals contain clinically relevant respiratory features and enhance COBE detection when combined with conventional physiological signals. This supports non-contact video as a complementary modality for automated COBE detection and highlights its potential to improve the robustness of neonatal respiratory monitoring.

2.7LGJun 22

Deep learning-based detection of cessation of breathing in pre-term infants

Dineo Serame, Lionel Tarassenko, Mauricio Villarroel

Apnoea of prematurity is characterised by recurrent episodes of cessation of breathing and remains difficult to detect reliably using routinely monitored physiological signals in the Neonatal Intensive Care Unit (NICU). Existing bedside monitors rely primarily on respiratory rate and oxygen saturation thresholds, often generating high false-positive alarm rates and missing short or irregular events. Improving automated detection using routinely acquired clinical signals could enhance identification of clinically meaningful events without additional sensing hardware. We evaluated deep learning-based detection of apnoea-related Cessation Of BrEathing (COBE) events using impedance pneumography (IP), electrocardiography (ECG), and photoplethysmography (PPG) signals from approximately 430 hours of NICU recordings collected from 24 pre-term infants. Three independent reviewers annotated COBE events, producing a dataset of 346 COBE and 608 non-COBE events. We compared a shallow convolutional neural network (CNN), residual networks (ResNets), and a ConvNeXt architecture using an independent held-out test set. Across all architectures, detection performance was influenced more strongly by signal modality than by architectural complexity. Unimodal IP-based models achieved balanced accuracies of 86.8-88.0%, outperforming ECG-derived (62.6-69.7%) and PPG-derived (65.1-66.4%) respiratory surrogates. Multimodal fusion yielded modest improvements over IP alone. The best-performing model, a ConvNeXt architecture combining IP and PPG inputs, achieved 88.7% balanced accuracy and an F1 score of 0.75 on the independent test set. These findings demonstrate that deep learning models applied to routinely monitored NICU signals can reliably detect COBE events and highlight the importance of signal modality in data-constrained neonatal monitoring settings.

0.6CVJun 18

InfantFace: Detecting infant faces in neonatal clinical environments

Abdullah Bin-Obaid, Maria M. Cobo, Rebeccah Slater et al.

Reliable localisation of the neonatal face is the first step for several video-camera based non-contact assessments such as pain and distress related facial expression analysis, pain scoring, cardiorespiratory signal extraction and cessation of breathing alerts. However, major challenges persist in neonatal clinical environments. Cluttered backgrounds, illumination changes and poor lighting conditions can reduce the accuracy of face detection models. Clinical interventions, monitoring equipment and, in some cases, medical devices can obstruct the face, making visual assessment difficult. We propose a one-stage YOLOv11m-based model tailored for face detection of infants in neonatal clinical environments. We combined multiple publicly available datasets (VGGFace2, CelebA, FDDB, WIDER FACE) to train and evaluate our proposed model. We then fine-tuned our model on a neonatal research dataset involving 228 videos from 114 recording sessions of 113 independent infants. Before fine-tuning, our model achieved an AP50 of 0.87, surpassing the performance of three state-of-the-art general face detectors. Performance improved further to an AP50 of 0.96 after clinical-domain adaptation. Evaluating face detection performance across different datasets remains a challenge due to the lack of publicly available neonatal datasets. Prioritising the creation of such datasets, while upholding appropriate privacy safeguards and ethical standards in their creation and use, would greatly support further progress in this field.

3.8CVJun 18

NeoLoc-68: End-to-end 68-point neonatal facial landmark localisation in neonatal clinical environments

Abdullah Bin-Obaid, Maria M. Cobo, Rebeccah Slater et al.

Facial landmark localisation is a prerequisite for developing automated, non-contact neonatal pain assessment methods. Clinicians use pain scales to judge the severity of pain, many of which rely on facial expression. However, facial landmark detectors trained on adult faces perform poorly in neonatal clinical environments due to frequent occlusions caused by medical equipment, varied head poses, and challenging imaging conditions, including motion blur triggered by sudden pain-related movements. We propose an end-to-end facial landmark detector capable of predicting 68 landmarks on neonatal faces in clinical environments. We combined 37,459 single-face images from 11 public datasets, standardised to 68-point markup, with 1,123 manually annotated frames from a neonatal research dataset (totalling over 76,000 landmarks). A YOLO-based keypoint model was adapted to regress the facial landmarks, initialised with weights from a pretrained neonatal face detector. On public datasets, our proposed model achieved state-of-the-art performance: Normalised Mean Error (NME) = 5.37, Failure Rate (FR) = 12.5%, Area Under the Cumulative Error Curve (AUC) at AUC0.08 = 38.00% and AUC0.1 = 48.70%. On the clinical neonatal test set, before fine-tuning, the model achieved the lowest Detection Failure Rate (DFR) = 5.3% among all baselines and showed strong generalisation. After fine-tuning, performance improved further to NME = 6.36, FR = 22.30%, DFR = 1.77%, AUC0.08 = 29.24% and AUC0.1 = 40.25%. To the best of our knowledge, this represents the first end-to-end 68-point neonatal facial landmark detection model. With further dataset expansion and refinement, it could support downstream tasks in neonatal health monitoring and pain-related facial analysis.

18.0HCApr 26, 2025Code

Clinical knowledge in LLMs does not translate to human interactions

Andrew M. Bean, Rebecca Payne, Guy Parsons et al. · oxford

Global healthcare providers are exploring use of large language models (LLMs) to provide medical advice to the public. LLMs now achieve nearly perfect scores on medical licensing exams, but this does not necessarily translate to accurate performance in real-world settings. We tested if LLMs can assist members of the public in identifying underlying conditions and choosing a course of action (disposition) in ten medical scenarios in a controlled study with 1,298 participants. Participants were randomly assigned to receive assistance from an LLM (GPT-4o, Llama 3, Command R+) or a source of their choice (control). Tested alone, LLMs complete the scenarios accurately, correctly identifying conditions in 94.9% of cases and disposition in 56.3% on average. However, participants using the same LLMs identified relevant conditions in less than 34.5% of cases and disposition in less than 44.2%, both no better than the control group. We identify user interactions as a challenge to the deployment of LLMs for medical advice. Standard benchmarks for medical knowledge and simulated patient interactions do not predict the failures we find with human participants. Moving forward, we recommend systematic human user testing to evaluate interactive capabilities prior to public deployments in healthcare.

6.4LGNov 7, 2024Code

wav2sleep: A Unified Multi-Modal Approach to Sleep Stage Classification from Physiological Signals

Jonathan F. Carter, Lionel Tarassenko

Accurate classification of sleep stages from less obtrusive sensor measurements such as the electrocardiogram (ECG) or photoplethysmogram (PPG) could enable important applications in sleep medicine. Existing approaches to this problem have typically used deep learning models designed and trained to operate on one or more specific input signals. However, the datasets used to develop these models often do not contain the same sets of input signals. Some signals, particularly PPG, are much less prevalent than others, and this has previously been addressed with techniques such as transfer learning. Additionally, only training on one or more fixed modalities precludes cross-modal information transfer from other sources, which has proved valuable in other problem domains. To address this, we introduce wav2sleep, a unified model designed to operate on variable sets of input signals during training and inference. After jointly training on over 10,000 overnight recordings from six publicly available polysomnography datasets, including SHHS and MESA, wav2sleep outperforms existing sleep stage classification models across test-time input combinations including ECG, PPG, and respiratory signals.

3.7CVApr 4, 2024

SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers

Jonathan F. Carter, João Jorge, Oliver Gibson et al.

Advances in camera-based physiological monitoring have enabled the robust, non-contact measurement of respiration and the cardiac pulse, which are known to be indicative of the sleep stage. This has led to research into camera-based sleep monitoring as a promising alternative to "gold-standard" polysomnography, which is cumbersome, expensive to administer, and hence unsuitable for longer-term clinical studies. In this paper, we introduce SleepVST, a transformer model which enables state-of-the-art performance in camera-based sleep stage classification (sleep staging). After pre-training on contact sensor data, SleepVST outperforms existing methods for cardio-respiratory sleep staging on the SHHS and MESA datasets, achieving total Cohen's kappa scores of 0.75 and 0.77 respectively. We then show that SleepVST can be successfully transferred to cardio-respiratory waveforms extracted from video, enabling fully contact-free sleep staging. Using a video dataset of 50 nights, we achieve a total accuracy of 78.8\% and a Cohen's $κ$ of 0.71 in four-class video-based sleep staging, setting a new state-of-the-art in the domain.