SPSep 12, 2023
Respiratory Disease Classification and Biometric Analysis Using Biosignals from Digital StethoscopesConstantino Álvarez Casado, Manuel Lage Cañellas, Matteo Pedone et al.
Respiratory diseases remain a leading cause of mortality worldwide, highlighting the need for faster and more accurate diagnostic tools. This work presents a novel approach leveraging digital stethoscope technology for automatic respiratory disease classification and biometric analysis. Our approach has the potential to significantly enhance traditional auscultation practices. By leveraging one of the largest publicly available medical database of respiratory sounds, we train machine learning models to classify various respiratory health conditions. Our method differs from conventional methods by using Empirical Mode Decomposition (EMD) and spectral analysis techniques to isolate clinically relevant biosignals embedded within acoustic data captured by digital stethoscopes. This approach focuses on information closely tied to cardiovascular and respiratory patterns within the acoustic data. Spectral analysis and filtering techniques isolate Intrinsic Mode Functions (IMFs) strongly correlated with these physiological phenomena. These biosignals undergo a comprehensive feature extraction process for predictive modeling. These features then serve as input to train several machine learning models for both classification and regression tasks. Our approach achieves high accuracy in both binary classification (89% balanced accuracy for healthy vs. diseased) and multi-class classification (72% balanced accuracy for specific diseases like pneumonia and COPD). For the first time, this work introduces regression models capable of estimating age and body mass index (BMI) based solely on acoustic data, as well as a model for sex classification. Our findings underscore the potential of intelligent digital stethoscopes to significantly enhance assistive and remote diagnostic capabilities, contributing to advancements in digital health, telehealth, and remote patient monitoring.
CVFeb 12
Thermal Imaging for Contactless Cardiorespiratory and Sudomotor Response MonitoringConstantino Álvarez Casado, Mohammad Rahman, Sasan Sharifipour et al.
Thermal infrared imaging captures skin temperature changes driven by autonomic regulation and can potentially provide contactless estimation of electrodermal activity (EDA), heart rate (HR), and breathing rate (BR). While visible-light methods address HR and BR, they cannot access EDA, a standard marker of sympathetic activation. This paper characterizes the extraction of these three biosignals from facial thermal video using a signal-processing pipeline that tracks anatomical regions, applies spatial aggregation, and separates slow sudomotor trends from faster cardiorespiratory components. For HR, we apply an orthogonal matrix image transformation (OMIT) decomposition across multiple facial regions of interest (ROIs), and for BR we average nasal and cheek signals before spectral peak detection. We evaluate 288 EDA configurations and the HR/BR pipeline on 31 sessions from the public SIMULATOR STUDY 1 (SIM1) driver monitoring dataset. The best fixed EDA configuration (nose region, exponential moving average) reaches a mean absolute correlation of $0.40 \pm 0.23$ against palm EDA, with individual sessions reaching 0.89. BR estimation achieves a mean absolute error of $3.1 \pm 1.1$ bpm, while HR estimation yields $13.8 \pm 7.5$ bpm MAE, limited by the low camera frame rate (7.5 Hz). We report signal polarity alternation across sessions, short thermodynamic latency for well-tracked signals, and condition-dependent and demographic effects on extraction quality. These results provide baseline performance bounds and design guidance for thermal contactless biosignal estimation.
CVSep 14, 2023
Facial Kinship Verification from remote photoplethysmographyXiaoting Wu, Xiaoyi Feng, Constantino Álvarez Casado et al.
Facial Kinship Verification (FKV) aims at automatically determining whether two subjects have a kinship relation based on human faces. It has potential applications in finding missing children and social media analysis. Traditional FKV faces challenges as it is vulnerable to spoof attacks and raises privacy issues. In this paper, we explore for the first time the FKV with vital bio-signals, focusing on remote Photoplethysmography (rPPG). rPPG signals are extracted from facial videos, resulting in a one-dimensional signal that measures the changes in visible light reflection emitted to and detected from the skin caused by the heartbeat. Specifically, in this paper, we employed a straightforward one-dimensional Convolutional Neural Network (1DCNN) with a 1DCNN-Attention module and kinship contrastive loss to learn the kin similarity from rPPGs. The network takes multiple rPPG signals extracted from various facial Regions of Interest (ROIs) as inputs. Additionally, the 1DCNN attention module is designed to learn and capture the discriminative kin features from feature embeddings. Finally, we demonstrate the feasibility of rPPG to detect kinship with the experiment evaluation on the UvANEMO Smile Database from different kin relations.
SPDec 11, 2023
Non-contact Multimodal Indoor Human Monitoring Systems: A SurveyLe Ngu Nguyen, Praneeth Susarla, Anirban Mukherjee et al.
Indoor human monitoring systems leverage a wide range of sensors, including cameras, radio devices, and inertial measurement units, to collect extensive data from users and the environment. These sensors contribute diverse data modalities, such as video feeds from cameras, received signal strength indicators and channel state information from WiFi devices, and three-axis acceleration data from inertial measurement units. In this context, we present a comprehensive survey of multimodal approaches for indoor human monitoring systems, with a specific focus on their relevance in elderly care. Our survey primarily highlights non-contact technologies, particularly cameras and radio devices, as key components in the development of indoor human monitoring systems. Throughout this article, we explore well-established techniques for extracting features from multimodal data sources. Our exploration extends to methodologies for fusing these features and harnessing multiple modalities to improve the accuracy and robustness of machine learning models. Furthermore, we conduct comparative analysis across different data modalities in diverse human monitoring tasks and undertake a comprehensive examination of existing multimodal datasets. This extensive survey not only highlights the significance of indoor human monitoring systems but also affirms their versatile applications. In particular, we emphasize their critical role in enhancing the quality of elderly care, offering valuable insights into the development of non-contact monitoring solutions applicable to the needs of aging populations.
CVMay 22, 2024
OMuSense-23: A Multimodal Dataset for Contactless Breathing Pattern Recognition and Biometric AnalysisManuel Lage Cañellas, Le Nguyen, Anirban Mukherjee et al.
In the domain of non-contact biometrics and human activity recognition, the lack of a versatile, multimodal dataset poses a significant bottleneck. To address this, we introduce the Oulu Multi Sensing (OMuSense-23) dataset that includes biosignals obtained from a mmWave radar, and an RGB-D camera. The dataset features data from 50 individuals in three distinct poses -- standing, sitting, and lying down -- each featuring four specific breathing pattern activities: regular breathing, reading, guided breathing, and apnea, encompassing both typical situations (e.g., sitting with normal breathing) and critical conditions (e.g., lying down without breathing). In our work, we present a detailed overview of the OMuSense-23 dataset, detailing the data acquisition protocol, describing the process for each participant. In addition, we provide, a baseline evaluation of several data analysis tasks related to biometrics, breathing pattern recognition and pose identification. Our results achieve a pose identification accuracy of 87\% and breathing pattern activity recognition of 83\% using features extracted from biosignals. The OMuSense-23 dataset is publicly available as resource for other researchers and practitioners in the field.
CLMar 14, 2025
Palette of Language Models: A Solver for Controlled Text GenerationZhe Yang, Yi Huang, Yaqin Chen et al.
Recent advancements in large language models have revolutionized text generation with their remarkable capabilities. These models can produce controlled texts that closely adhere to specific requirements when prompted appropriately. However, designing an optimal prompt to control multiple attributes simultaneously can be challenging. A common approach is to linearly combine single-attribute models, but this strategy often overlooks attribute overlaps and can lead to conflicts. Therefore, we propose a novel combination strategy inspired by the Law of Total Probability and Conditional Mutual Information Minimization on generative language models. This method has been adapted for single-attribute control scenario and is termed the Palette of Language Models due to its theoretical linkage between attribute strength and generation style, akin to blending colors on an artist's palette. Moreover, positive correlation and attribute enhancement are advanced as theoretical properties to guide a rational combination strategy design. We conduct experiments on both single control and multiple control settings, and achieve surpassing results.
CVJun 24, 2019
Audio-Visual Kinship VerificationXiaoting Wu, Eric Granger, Xiaoyi Feng
Visual kinship verification entails confirming whether or not two individuals in a given pair of images or videos share a hypothesized kin relation. As a generalized face verification task, visual kinship verification is particularly difficult with low-quality found Internet data. Due to uncontrolled variations in background, pose, facial expression, blur, illumination and occlusion, state-of-the-art methods fail to provide high level of recognition accuracy. As with many other visual recognition tasks, kinship verification may benefit from combining visual and audio signals. However, voice-based kinship verification has received very little prior attention. We hypothesize that the human voice contains kin-related cues that are complementary to visual cues. In this paper, we address, for the first time, the use of audio-visual information from face and voice modalities to perform kinship verification. We first propose a new multi-modal kinship dataset, called TALking KINship (TALKIN), that contains several pairs of Internet-quality video sequences. Using TALKIN, we then study the utility of various kinship verification methods including traditional local feature based methods, statistical methods and more recent deep learning approaches. Then, early and late fusion methods are evaluated on the TALKIN dataset for the study of kinship verification with both face and voice modalities. Finally, we propose a deep Siamese fusion network with contrastive loss for multi-modal fusion of kinship relations. Extensive experiments on the TALKIN dataset indicate that by combining face and voice modalities, the proposed Siamese network can provide a significantly higher level of accuracy compared to baseline uni-modal and multi-modal fusion techniques. Experimental results also indicate that audio (vocal) information is complementary (to facial information) and useful for kinship verification.