CVSep 21, 2022
Self-adversarial Multi-scale Contrastive Learning for Semantic Segmentation of Thermal Facial ImagesJitesh Joshi, Nadia Bianchi-Berthouze, Youngjun Cho
Segmentation of thermal facial images is a challenging task. This is because facial features often lack salience due to high-dynamic thermal range scenes and occlusion issues. Limited availability of datasets from unconstrained settings further limits the use of the state-of-the-art segmentation networks, loss functions and learning strategies which have been built and validated for RGB images. To address the challenge, we propose Self-Adversarial Multi-scale Contrastive Learning (SAM-CL) framework as a new training strategy for thermal image segmentation. SAM-CL framework consists of a SAM-CL loss function and a thermal image augmentation (TiAug) module as a domain-specific augmentation technique. We use the Thermal-Face-Database to demonstrate effectiveness of our approach. Experiments conducted on the existing segmentation networks (UNET, Attention-UNET, DeepLabV3 and HRNetv2) evidence the consistent performance gains from the SAM-CL framework. Furthermore, we present a qualitative analysis with UBComfort and DeepBreath datasets to discuss how our proposed methods perform in handling unconstrained situations.
DLJan 15, 2023
TextileNet: A Material Taxonomy-based Fashion Textile DatasetShu Zhong, Miriam Ribul, Youngjun Cho et al.
The rise of Machine Learning (ML) is gradually digitalizing and reshaping the fashion industry. Recent years have witnessed a number of fashion AI applications, for example, virtual try-ons. Textile material identification and categorization play a crucial role in the fashion textile sector, including fashion design, retails, and recycling. At the same time, Net Zero is a global goal and the fashion industry is undergoing a significant change so that textile materials can be reused, repaired and recycled in a sustainable manner. There is still a challenge in identifying textile materials automatically for garments, as we lack a low-cost and effective technique for identifying them. In light of this, we build the first fashion textile dataset, TextileNet, based on textile material taxonomies - a fibre taxonomy and a fabric taxonomy generated in collaboration with material scientists. TextileNet can be used to train and evaluate the state-of-the-art Deep Learning models for textile materials. We hope to standardize textile related datasets through the use of taxonomies. TextileNet contains 33 fibres labels and 27 fabrics labels, and has in total 760,949 images. We use standard Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to establish baselines for this dataset. Future applications for this dataset range from textile classification to optimization of the textile supply chain and interactive design for consumers. We envision that this can contribute to the development of a new AI-based fashion platform.
APP-PHMar 19, 2018
Sensorless Resonance Tracking of Resonant Electromagnetic Actuator through Back-EMF Estimation for Mobile DevicesYoungjun Cho
Resonant electromagnetic actuators have been broadly used as vibration motors for mobile devices given their ability of generating relatively fast, strong, and controllable vibration force at a given resonant frequency. Mechanism of the actuators that is based on mechanical resonance, however, limits their use to a situation where their resonant frequencies are known and unshifted. In reality, there are many factors that alter the resonant frequency: for example, manufacturing tolerances, worn mechanical components such as a spring, nonlinearity in association with different input voltage levels. Here, we describe a sensorless resonance tracking method that actuates the motor and automatically detects its unknown damped natural frequency through the estimation of back electromotive force (EMF) and inner mass movements. We demonstrate the tracking performance of the proposed method through a series of experiments. This approach has the potential to control residual vibrations and then improve vibrotactile feedback, which can potentially be used for human-computer interaction, cognitive and affective neuroscience research.
CVNov 3, 2024Code
FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological SensingJitesh Joshi, Sos S. Agaian, Youngjun Cho
Remote photoplethysmography (rPPG) enables non-invasive extraction of blood volume pulse signals through imaging, transforming spatial-temporal data into time series signals. Advances in end-to-end rPPG approaches have focused on this transformation where attention mechanisms are crucial for feature extraction. However, existing methods compute attention disjointly across spatial, temporal, and channel dimensions. Here, we propose the Factorized Self-Attention Module (FSAM), which jointly computes multidimensional attention from voxel embeddings using nonnegative matrix factorization. To demonstrate FSAM's effectiveness, we developed FactorizePhys, an end-to-end 3D-CNN architecture for estimating blood volume pulse signals from raw video frames. Our approach adeptly factorizes voxel embeddings to achieve comprehensive spatial, temporal, and channel attention, enhancing performance of generic signal extraction tasks. Furthermore, we deploy FSAM within an existing 2D-CNN-based rPPG architecture to illustrate its versatility. FSAM and FactorizePhys are thoroughly evaluated against state-of-the-art rPPG methods, each representing different types of architecture and attention mechanism. We perform ablation studies to investigate the architectural decisions and hyperparameters of FSAM. Experiments on four publicly available datasets and intuitive visualization of learned spatial-temporal features substantiate the effectiveness of FSAM and enhanced cross-dataset generalization in estimating rPPG signals, suggesting its broader potential as a multidimensional attention mechanism. The code is accessible at https://github.com/PhysiologicAILab/FactorizePhys.
CVSep 13, 2023
Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency DetectionGuangyu Ren, Jitesh Joshi, Youngjun Cho
RGB-T saliency detection has emerged as an important computer vision task, identifying conspicuous objects in challenging scenes such as dark environments. However, existing methods neglect the characteristics of cross-modal features and rely solely on network structures to fuse RGB and thermal features. To address this, we first propose a Multi-Modal Hybrid loss (MMHL) that comprises supervised and self-supervised loss functions. The supervised loss component of MMHL distinctly utilizes semantic features from different modalities, while the self-supervised loss component reduces the distance between RGB and thermal features. We further consider both spatial and channel information during feature fusion and propose the Hybrid Fusion Module to effectively fuse RGB and thermal features. Lastly, instead of jointly training the network with cross-modal features, we implement a sequential training strategy which performs training only on RGB images in the first stage and then learns cross-modal features in the second stage. This training strategy improves saliency detection performance without computational overhead. Results from performance evaluation and ablation studies demonstrate the superior performance achieved by the proposed method compared with the existing state-of-the-art methods.
34.2CVApr 30
MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave VideoXijia Wei, Yuan Fang, Kevin Chetty et al.
Millimetre-wave (mmWave) radar offers a more privacy-preserving alternative to RGB-based human pose estimation. However, existing methods typically rely on pre-extracted intermediate representations such as sparse point clouds or spectrogram images, where the rich spatiotemporal information naturally present in radar video streams is discarded for model learning, while such signal processing adds system complexity. In addition, existing solutions are mainly conducted in an end-to-end supervised manner without leveraging unlabelled raw video streams to learn generalized representations. In this study, we present MAEPose, a masked autoencoding-based human pose estimation approach that operates directly on mmWave spectrogram videos. MAEPose learns spatiotemporal motion-aware generalized representations from unlabelled radar video, and leverages its heatmap decoder for multi-frame pose estimation predictions. We evaluate it across three datasets based on leave-one-person-out cross-validation with rigorous statistical testing. MAEPose consistently outperforms state-of-the-art baselines by up to 22.1% in MPJPE p<0.05, and maintains robust accuracy under zero-shot bystander interference with only a 6.5% error increase. Ablation studies confirm that both the pre-training and the heatmap decoder contribute substantially, while modality analysis indicates that leveraging Range-Doppler video as input achieves better pose estimation performance than Range-Azimuth or their fusion, with lower computational cost.
AIAug 25, 2025
Spacer: Towards Engineered Scientific InspirationMinhyeong Lee, Suyoung Hwang, Seunghyun Moon et al.
Recent advances in LLMs have made automated scientific research the next frontline in the path to artificial superintelligence. However, these systems are bound either to tasks of narrow scope or the limited creative capabilities of LLMs. We propose Spacer, a scientific discovery system that develops creative and factually grounded concepts without external intervention. Spacer attempts to achieve this via 'deliberate decontextualization,' an approach that disassembles information into atomic units - keywords - and draws creativity from unexplored connections between them. Spacer consists of (i) Nuri, an inspiration engine that builds keyword sets, and (ii) the Manifesting Pipeline that refines these sets into elaborate scientific statements. Nuri extracts novel, high-potential keyword sets from a keyword graph built with 180,000 academic publications in biological fields. The Manifesting Pipeline finds links between keywords, analyzes their logical structure, validates their plausibility, and ultimately drafts original scientific concepts. According to our experiments, the evaluation metric of Nuri accurately classifies high-impact publications with an AUROC score of 0.737. Our Manifesting Pipeline also successfully reconstructs core concepts from the latest top-journal articles solely from their keyword sets. An LLM-based scoring system estimates that this reconstruction was sound for over 85% of the cases. Finally, our embedding space analysis shows that outputs from Spacer are significantly more similar to leading publications compared with those from SOTA LLMs.
CVMay 11, 2025
Efficient and Robust Multidimensional Attention in Remote Physiological Sensing through Target Signal Constrained FactorizationJitesh Joshi, Youngjun Cho
Remote physiological sensing using camera-based technologies offers transformative potential for non-invasive vital sign monitoring across healthcare and human-computer interaction domains. Although deep learning approaches have advanced the extraction of physiological signals from video data, existing methods have not been sufficiently assessed for their robustness to domain shifts. These shifts in remote physiological sensing include variations in ambient conditions, camera specifications, head movements, facial poses, and physiological states which often impact real-world performance significantly. Cross-dataset evaluation provides an objective measure to assess generalization capabilities across these domain shifts. We introduce Target Signal Constrained Factorization module (TSFM), a novel multidimensional attention mechanism that explicitly incorporates physiological signal characteristics as factorization constraints, allowing more precise feature extraction. Building on this innovation, we present MMRPhys, an efficient dual-branch 3D-CNN architecture designed for simultaneous multitask estimation of photoplethysmography (rPPG) and respiratory (rRSP) signals from multimodal RGB and thermal video inputs. Through comprehensive cross-dataset evaluation on five benchmark datasets, we demonstrate that MMRPhys with TSFM significantly outperforms state-of-the-art methods in generalization across domain shifts for rPPG and rRSP estimation, while maintaining a minimal inference latency suitable for real-time applications. Our approach establishes new benchmarks for robust multitask and multimodal physiological sensing and offers a computationally efficient framework for practical deployment in unconstrained environments. The web browser-based application featuring on-device real-time inference of MMRPhys model is available at https://physiologicailab.github.io/mmrphys-live
CLJun 5, 2024
Exploring Human-AI Perception Alignment in Sensory Experiences: Do LLMs Understand Textile Hand?Shu Zhong, Elia Gatti, Youngjun Cho et al.
Aligning large language models (LLMs) behaviour with human intent is critical for future AI. An important yet often overlooked aspect of this alignment is the perceptual alignment. Perceptual modalities like touch are more multifaceted and nuanced compared to other sensory modalities such as vision. This work investigates how well LLMs align with human touch experiences using the "textile hand" task. We created a "Guess What Textile" interaction in which participants were given two textile samples -- a target and a reference -- to handle. Without seeing them, participants described the differences between them to the LLM. Using these descriptions, the LLM attempted to identify the target textile by assessing similarity within its high-dimensional embedding space. Our results suggest that a degree of perceptual alignment exists, however varies significantly among different textile samples. For example, LLM predictions are well aligned for silk satin, but not for cotton denim. Moreover, participants didn't perceive their textile experiences closely matched by the LLM predictions. This is only the first exploration into perceptual alignment around touch, exemplified through textile hand. We discuss possible sources of this alignment variance, and how better human-AI perceptual alignment can benefit future everyday tasks.
HCFeb 12, 2021
Rethinking Eye-blink: Assessing Task Difficulty through Physiological Representation of Spontaneous BlinkingYoungjun Cho
Continuous assessment of task difficulty and mental workload is essential in improving the usability and accessibility of interactive systems. Eye tracking data has often been investigated to achieve this ability, with reports on the limited role of standard blink metrics. Here, we propose a new approach to the analysis of eye-blink responses for automated estimation of task difficulty. The core module is a time-frequency representation of eye-blink, which aims to capture the richness of information reflected on blinking. In our first study, we show that this method significantly improves the sensitivity to task difficulty. We then demonstrate how to form a framework where the represented patterns are analyzed with multi-dimensional Long Short-Term Memory recurrent neural networks for their non-linear mapping onto difficulty-related parameters. This framework outperformed other methods that used hand-engineered features. This approach works with any built-in camera, without requiring specialized devices. We conclude by discussing how Rethinking Eye-blink can benefit real-world applications.
HCDec 24, 2020
Hearing through Vibrations: Perception of Musical Emotions by Profoundly Deaf PeopleAnastasia Schmitz, Catherine Holloway, Youngjun Cho
Advances in tactile-audio feedback technology have created new possibilities for deaf people to feel music. However, little is known about deaf individuals' perception of musical emotions through vibrotactile feedback. In this paper, we present the findings from a mixed-methods study with 16 profoundly deaf participants. The study protocol was designed to explore how users of a backpack-style vibrotactile display perceive intended emotions in twenty music excerpts. Quantitative analysis demonstrated that participants correctly identified happy and angry excerpts and rated them as more arousing than sad and peaceful excerpts. More positive emotions were experienced during happy compared to angry excerpts while peaceful and sad excerpts were hard to be differentiated. Based on qualitative data, we highlight the benefits and limitations of using vibrations to convey musical emotions to profoundly deaf users. Finally, we provide guidelines for designing accessible music experiences for the deaf community.
HCAug 27, 2019
Physiological and Affective Computing through Thermal Imaging: A SurveyYoungjun Cho, Nadia Bianchi-Berthouze
Thermal imaging-based physiological and affective computing is an emerging research area enabling technologies to monitor our bodily functions and understand psychological and affective needs in a contactless manner. However, up to recently, research has been mainly carried out in very controlled lab settings. As small size and even low-cost versions of thermal video cameras have started to appear on the market, mobile thermal imaging is opening its door to ubiquitous and real-world applications. Here we review the literature on the use of thermal imaging to track changes in physiological cues relevant to affective computing and the technological requirements set so far. In doing so, we aim to establish computational and methodological pipelines from thermal images of the human skin to affective states and outline the research opportunities and challenges to be tackled to make ubiquitous real-life thermal imaging-based affect monitoring a possibility.
HCMay 13, 2019
Nose Heat: Exploring Stress-induced Nasal Thermal Variability through Mobile Thermal ImagingYoungjun Cho, Nadia Bianchi-Berthouze, Manuel Oliveira et al.
Automatically monitoring and quantifying stress-induced thermal dynamic information in real-world settings is an extremely important but challenging problem. In this paper, we explore whether we can use mobile thermal imaging to measure the rich physiological cues of mental stress that can be deduced from a person's nose temperature. To answer this question we build i) a framework for monitoring nasal thermal variable patterns continuously and ii) a novel set of thermal variability metrics to capture a richness of the dynamic information. We evaluated our approach in a series of studies including laboratory-based psychosocial stress-induction tasks and real-world factory settings. We demonstrate our approach has the potential for assessing stress responses beyond controlled laboratory settings.
HCApr 12, 2019
Expressive haptics for enhanced usability of mobile interfaces in situations of impairmentsTigmanshu Bhatnagar, Youngjun Cho, Nicolai Marquardt et al.
Designing for situational awareness could lead to better solutions for disabled people, likewise, exploring the needs of disabled people could lead to innovations that can address situational impairments. This in turn can create non-stigmatising assistive technology for disabled people from which eventually everyone could benefit. In this paper, we investigate the potential for advanced haptics to compliment the graphical user interface of mobile devices, thereby enhancing user experiences of all people in some situations (e.g. sunlight interfering with interaction) and visually impaired people. We explore technical solutions to this problem space and demonstrate our justification for a focus on the creation of kinaesthetic force feedback. We propose initial design concepts and studies, with a view to co-create delightful and expressive haptic interactions with potential users motivated by scenarios of situational and permanent impairments.
MED-PHDec 21, 2018
Instant Automated Inference of Perceived Mental Stress through Smartphone PPG and Thermal ImagingYoungjun Cho, Simon J. Julier, Nadia Bianchi-Berthouze
Background: A smartphone is a promising tool for daily cardiovascular measurement and mental stress monitoring. A smartphone camera-based PhotoPlethysmoGraphy (PPG) and a low-cost thermal camera can be used to create cheap, convenient and mobile monitoring systems. However, to ensure reliable monitoring results, a person has to remain still for several minutes while a measurement is being taken. This is very cumbersome and makes its use in real-life mobile situations quite impractical. Objective: We propose a system which combines PPG and thermography with the aim of improving cardiovascular signal quality and capturing stress responses quickly. Methods: Using a smartphone camera with a low cost thermal camera added on, we built a novel system which continuously and reliably measures two different types of cardiovascular events: i) blood volume pulse and ii) vasoconstriction/dilation-induced temperature changes of the nose tip. 17 healthy participants, involved in a series of stress-inducing mental workload tasks, measured their physiological responses to stressors over a short window of time (20 seconds) immediately after each task. Participants reported their level of perceived mental stress using a 10-cm Visual Analogue Scale (VAS). We used normalized K-means clustering to reduce interpersonal differences in the self-reported ratings. For the instant stress inference task, we built novel low-level feature sets representing variability of cardiovascular patterns. We then used the automatic feature learning capability of artificial Neural Networks (NN) to improve the mapping between the extracted set of features and the self-reported ratings. We compared our proposed method with existing hand-engineered features-based machine learning methods. Results, Conclusions: ... due to limited space here, we refer to our manuscript.
CVMar 6, 2018
Deep Thermal Imaging: Proximate Material Type Recognition in the Wild through Deep Learning of Spatial Surface Temperature PatternsYoungjun Cho, Nadia Bianchi-Berthouze, Nicolai Marquardt et al.
We introduce Deep Thermal Imaging, a new approach for close-range automatic recognition of materials to enhance the understanding of people and ubiquitous technologies of their proximal environment. Our approach uses a low-cost mobile thermal camera integrated into a smartphone to capture thermal textures. A deep neural network classifies these textures into material types. This approach works effectively without the need for ambient light sources or direct contact with materials. Furthermore, the use of a deep learning network removes the need to handcraft the set of features for different materials. We evaluated the performance of the system by training it to recognise 32 material types in both indoor and outdoor environments. Our approach produced recognition accuracies above 98% in 14,860 images of 15 indoor materials and above 89% in 26,584 images of 17 outdoor materials. We conclude by discussing its potentials for real-time use in HCI applications and future directions.
HCMar 6, 2018
RealPen: Providing Realism in Handwriting Tasks on Touch Surfaces using Auditory-Tactile FeedbackYoungjun Cho, Andrea Bianchi, Nicolai Marquardt et al.
We present RealPen, an augmented stylus for capacitive tablet screens that recreates the physical sensation of writing on paper with a pencil, ball-point pen or marker pen. The aim is to create a more engaging experience when writing on touch surfaces, such as screens of tablet computers. This is achieved by re-generating the friction-induced oscillation and sound of a real writing tool in contact with paper. To generate realistic tactile feedback, our algorithm analyses the frequency spectrum of the friction oscillation generated when writing with traditional tools, extracts principal frequencies, and uses the actuator's frequency response profile for an adjustment weighting function. We enhance the realism by providing the sound feedback aligned with the writing pressure and speed. Furthermore, we investigated the effects of superposition and fluctuation of several frequencies on human tactile perception, evaluated the performance of RealPen, and characterized users' perception and preference of each feedback type.
HCOct 13, 2017
ThermSense: Smartphone-based Breathing Sensing Platform using Noncontact Low-Cost Thermal CameraYoungjun Cho, Nadia Bianchi-Berthouze, Simon J. Julier et al.
The ability of sensing breathing is becoming an increasingly important function for technology that aims at supporting both psychological and physical wellbeing. We demonstrate ThermSense, a new breathing sensing platform based on smartphone technology and low-cost thermal camera, which allows a user to measure his/her breathing pattern in a contact-free manner. With the designed key functions of Thermal Voxel Integration-based breathing estimation and respiration variability spectrogram (RVS, bi-dimensional representation of breathing dynamics), the developed platform provides scalability and flexibility for gathering respiratory physiological measurements ubiquitously. The functionality could be used for a variety of applications from stress monitoring to respiration training.
HCAug 20, 2017
DeepBreath: Deep Learning of Breathing Patterns for Automatic Stress Recognition using Low-Cost Thermal Imaging in Unconstrained SettingsYoungjun Cho, Nadia Bianchi-Berthouze, Simon J. Julier
We propose DeepBreath, a deep learning model which automatically recognises people's psychological stress level (mental overload) from their breathing patterns. Using a low cost thermal camera, we track a person's breathing patterns as temperature changes around his/her nostril. The paper's technical contribution is threefold. First of all, instead of creating hand-crafted features to capture aspects of the breathing patterns, we transform the uni-dimensional breathing signals into two dimensional respiration variability spectrogram (RVS) sequences. The spectrograms easily capture the complexity of the breathing dynamics. Second, a spatial pattern analysis based on a deep Convolutional Neural Network (CNN) is directly applied to the spectrogram sequences without the need of hand-crafting features. Finally, a data augmentation technique, inspired from solutions for over-fitting problems in deep learning, is applied to allow the CNN to learn with a small-scale dataset from short-term measurements (e.g., up to a few hours). The model is trained and tested with data collected from people exposed to two types of cognitive tasks (Stroop Colour Word Test, Mental Computation test) with sessions of different difficulty levels. Using normalised self-report as ground truth, the CNN reaches 84.59% accuracy in discriminating between two levels of stress and 56.52% in discriminating between three levels. In addition, the CNN outperformed powerful shallow learning methods based on a single layer neural network. Finally, the dataset of labelled thermal images will be open to the community.
CVMay 8, 2017
Robust tracking of respiratory rate in high-dynamic range scenes using mobile thermal imagingYoungjun Cho, Simon J. Julier, Nicolai Marquardt et al.
The ability to monitor respiratory rate is extremely important for medical treatment, healthcare and fitness sectors. In many situations, mobile methods, which allow users to undertake every day activities, are required. However, current monitoring systems can be obtrusive, requiring users to wear respiration belts or nasal probes. Recent advances in thermographic systems have shrunk their size, weight and cost, to the point where it is possible to create smart-phone based respiration rate monitoring devices that are not affected by lighting conditions. However, mobile thermal imaging is challenged in scenes with high thermal dynamic ranges. This challenge is further amplified by general problems such as motion artifacts and low spatial resolution, leading to unreliable breathing signals. In this paper, we propose a novel and robust approach for respiration tracking which compensates for the negative effects of variations in the ambient temperature and motion artifacts and can accurately extract breathing rates in highly dynamic thermal scenes. It has three main contributions. The first is a novel Optimal Quantization technique which adaptively constructs a color mapping of absolute temperature to improve segmentation, classification and tracking. The second is the Thermal Gradient Flow method that computes thermal gradient magnitude maps to enhance accuracy of the nostril region tracking. Finally, we introduce the Thermal Voxel method to increase the reliability of the captured respiration signals compared to the traditional averaging method. We demonstrate the extreme robustness of our system to track the nostril-region and measure the respiratory rate in high dynamic range scenes.