CVJul 29, 2024
Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRSStefanos Gkikas, Manolis Tsiknakis
Automatic pain assessment plays a critical role for advancing healthcare and optimizing pain management strategies. This study has been submitted to the First Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN). The proposed multimodal framework utilizes facial videos and fNIRS and presents a modality-agnostic approach, alleviating the need for domain-specific models. Employing a dual ViT configuration and adopting waveform representations for the fNIRS, as well as for the extracted embeddings from the two modalities, demonstrate the efficacy of the proposed method, achieving an accuracy of 46.76% in the multilevel pain assessment task.
AIJul 28, 2024
Multi-task Neural Networks for Pain Intensity Estimation using Electrocardiogram and Demographic FactorsStefanos Gkikas, Chariklia Chatzaki, Manolis Tsiknakis
Pain is a complex phenomenon which is manifested and expressed by patients in various forms. The immediate and objective recognition of it is a great of importance in order to attain a reliable and unbiased healthcare system. In this work, we elaborate electrocardiography signals revealing the existence of variations in pain perception among different demographic groups. We exploit this insight by introducing a novel multi-task neural network for automatic pain estimation utilizing the age and the gender information of each individual, and show its advantages compared to other approaches.
CVJul 29, 2024
Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP ArchitectureStefanos Gkikas, Manolis Tsiknakis
Pain assessment is essential in developing optimal pain management protocols to alleviate suffering and prevent functional decline in patients. Consequently, reliable and accurate automatic pain assessment systems are essential for continuous and effective patient monitoring. This study presents synthetic thermal videos generated by Generative Adversarial Networks integrated into the pain recognition pipeline and evaluates their efficacy. A framework consisting of a Vision-MLP and a Transformer-based module is utilized, employing RGB and synthetic thermal videos in unimodal and multimodal settings. Experiments conducted on facial videos from the BioVid database demonstrate the effectiveness of synthetic thermal videos and underline the potential advantages of it.
63.2SPApr 21
1BT: One-Block Transformer for EEG-Based Cognitive Workload AssessmentStefanos Gkikas, Christian Arzate Cruz, Thomas Kassiotis et al.
Accurate and continuous estimation of cognitive workload is fundamental to creating adaptive human-machine systems. However, designing architectures that balance representational capacity with computational efficiency has been challenging for practical deployment. This paper introduces 1BT, a One-Block Transformer for compact and efficient EEG-based cognitive workload assessment. The model aggregates multi-channel temporal sequences via a minimal latent bottleneck, using a single cross-attention module followed by lightweight self-attention. A controlled study involving 11 participants performing three cognitively diverse tasks (abstract reasoning, numerical problem-solving, and an interactive video game) was conducted with continuous EEG recordings across two workload levels. Systematic architectural analysis identifies the most compact configuration that preserves high performance, while substantially lowering computational cost. The final model achieves high workload classification performance with under 0.5 million parameters and 0.02 GFLOPs, paving the way for a design direction for real-time cognitive workload monitoring in resource-constrained settings.
66.9CVApr 13
A Lightweight Transformer for Pain Recognition from Brain ActivityStefanos Gkikas, Christian Arzate Cruz, Yu Fang et al.
Pain is a multifaceted and widespread phenomenon with substantial clinical and societal burden, making reliable automated assessment a critical objective. This paper presents a lightweight transformer architecture that fuses multiple fNIRS representations through a unified tokenization mechanism, enabling joint modeling of complementary signal views without requiring modality-specific adaptations or increasing architectural complexity. The proposed token-mixing strategy preserves spatial, temporal, and time-frequency characteristics by projecting heterogeneous inputs onto a shared latent representation, using a structured segmentation scheme to control the granularity of local aggregation and global interaction. The model is evaluated on the AI4Pain dataset using stacked raw waveform and power spectral density representations of fNIRS inputs. Experimental results demonstrate competitive pain recognition performance while remaining computationally compact, making the approach suitable for real-time inference on both GPU and CPU hardware.
43.9ROApr 13
Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-SpeechEdwin C. Montiel-Vazquez, Christian Arzate Cruz, Stefanos Gkikas et al.
Co-speech gestures increase engagement and improve speech understanding. Most data-driven robot systems generate rhythmic beat-like motion, yet few integrate semantic emphasis. To address this, we propose a lightweight transformer that derives iconic gesture placement and intensity from text and emotion alone, requiring no audio input at inference time. The model outperforms GPT-4o in both semantic gesture placement classification and intensity regression on the BEAT2 dataset, while remaining computationally compact and suitable for real-time deployment on embodied agents.
CVMay 2, 2025Code
PainFormer: a Vision Foundation Model for Automatic Pain AssessmentStefanos Gkikas, Raul Fernandez Rojas, Manolis Tsiknakis
Pain is a manifold condition that impacts a significant percentage of the population. Accurate and reliable pain evaluation for the people suffering is crucial to developing effective and advanced pain management protocols. Automatic pain assessment systems provide continuous monitoring and support decision-making processes, ultimately aiming to alleviate distress and prevent functionality decline. This study introduces PainFormer, a vision foundation model based on multi-task learning principles trained simultaneously on 14 tasks/datasets with a total of 10.9 million samples. Functioning as an embedding extractor for various input modalities, the foundation model provides feature representations to the Embedding-Mixer, a transformer-based module that performs the final pain assessment. Extensive experiments employing behavioral modalities - including RGB, synthetic thermal, and estimated depth videos - and physiological modalities such as ECG, EMG, GSR, and fNIRS revealed that PainFormer effectively extracts high-quality embeddings from diverse input modalities. The proposed framework is evaluated on two pain datasets, BioVid and AI4Pain, and directly compared to 75 different methodologies documented in the literature. Experiments conducted in unimodal and multimodal settings demonstrate state-of-the-art performances across modalities and pave the way toward general-purpose models for automatic pain assessment. The foundation model's architecture (code) and weights are available at: https://github.com/GkikasStefanos/PainFormer.
AIJul 29, 2025Code
Tiny-BioMoE: a Lightweight Embedding Model for Biosignal AnalysisStefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis
Pain is a complex and pervasive condition that affects a significant portion of the population. Accurate and consistent assessment is essential for individuals suffering from pain, as well as for developing effective management strategies in a healthcare system. Automatic pain assessment systems enable continuous monitoring, support clinical decision-making, and help minimize patient distress while mitigating the risk of functional deterioration. Leveraging physiological signals offers objective and precise insights into a person's state, and their integration in a multimodal framework can further enhance system performance. This study has been submitted to the Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN). The proposed approach introduces Tiny-BioMoE, a lightweight pretrained embedding model for biosignal analysis. Trained on 4.4 million biosignal image representations and consisting of only 7.3 million parameters, it serves as an effective tool for extracting high-quality embeddings for downstream tasks. Extensive experiments involving electrodermal activity, blood volume pulse, respiratory signals, peripheral oxygen saturation, and their combinations highlight the model's effectiveness across diverse modalities in automatic pain recognition tasks. The model's architecture (code) and weights are available at https://github.com/GkikasStefanos/Tiny-BioMoE.
CVDec 19, 2024
A Full Transformer-based Framework for Automatic Pain Estimation using VideosStefanos Gkikas, Manolis Tsiknakis
The automatic estimation of pain is essential in designing an optimal pain management system offering reliable assessment and reducing the suffering of patients. In this study, we present a novel full transformer-based framework consisting of a Transformer in Transformer (TNT) model and a Transformer leveraging cross-attention and self-attention blocks. Elaborating on videos from the BioVid database, we demonstrate state-of-the-art performances, showing the efficacy, efficiency, and generalization capability across all the primary pain estimation tasks.
AIJul 29, 2025
Efficient Pain Recognition via Respiration Signals: A Single Cross-Attention Transformer Multi-Window Fusion PipelineStefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis
Pain is a complex condition that affects a large portion of the population. Accurate and consistent evaluation is essential for individuals experiencing pain and supports the development of effective and advanced management strategies. Automatic pain assessment systems provide continuous monitoring, aid clinical decision-making, and aim to reduce distress while preventing functional decline. This study has been submitted to the Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN). The proposed method introduces a pipeline that employs respiration as the input signal and integrates a highly efficient cross-attention transformer with a multi-windowing strategy. Extensive experiments demonstrate that respiration serves as a valuable physiological modality for pain assessment. Furthermore, results show that compact and efficient models, when properly optimized, can deliver strong performance, often surpassing larger counterparts. The proposed multi-window strategy effectively captures short-term and long-term features, along with global characteristics, enhancing the model's representational capacity.
AIJul 29, 2025
Multi-Representation Diagrams for Pain Recognition: Integrating Various Electrodermal Activity Signals into a Single ImageStefanos Gkikas, Ioannis Kyprakis, Manolis Tsiknakis
Pain is a multifaceted phenomenon that affects a substantial portion of the population. Reliable and consistent evaluation supports individuals experiencing pain and enables the development of effective and advanced management strategies. Automatic pain-assessment systems provide continuous monitoring, guide clinical decision-making, and aim to reduce distress while preventing functional decline. Incorporating physiological signals allows these systems to deliver objective, accurate insights into an individual's condition. This study has been submitted to the Second Multimodal Sensing Grand Challenge for Next-Gen Pain Assessment (AI4PAIN). The proposed method introduces a pipeline that employs electrodermal activity signals as the input modality. Multiple signal representations are generated and visualized as waveforms, which are then jointly presented within a unified multi-representation diagram. Extensive experiments using diverse processing and filtering techniques, along with various representation combinations, highlight the effectiveness of the approach. It consistently achieves comparable and, in several cases, superior results to traditional fusion methods, positioning it as a robust alternative for integrating different signal representations or modalities.
AIMay 8, 2025
A Pain Assessment Framework based on multimodal data and Deep Machine Learning methodsStefanos Gkikas
From the original abstract: This thesis initially aims to study the pain assessment process from a clinical-theoretical perspective while exploring and examining existing automatic approaches. Building on this foundation, the primary objective of this Ph.D. project is to develop innovative computational methods for automatic pain assessment that achieve high performance and are applicable in real clinical settings. A primary goal is to thoroughly investigate and assess significant factors, including demographic elements that impact pain perception, as recognized in pain research, through a computational standpoint. Within the limits of the available data in this research area, our goal was to design, develop, propose, and offer automatic pain assessment pipelines for unimodal and multimodal configurations that are applicable to the specific requirements of different scenarios. The studies published in this Ph.D. thesis showcased the effectiveness of the proposed methods, achieving state-of-the-art results. Additionally, they paved the way for exploring new approaches in artificial intelligence, foundation models, and generative artificial intelligence.