Yujian Ma

2papers

2 Papers

SDSep 10, 2025Code
Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition

Yujian Ma, Jinqiu Sang, Ruizhe Li

Large pre-trained speech models such as Whisper offer strong generalization but pose significant challenges for resource-efficient adaptation. Low-Rank Adaptation (LoRA) has become a popular parameter-efficient fine-tuning method, yet its underlying mechanisms in speech tasks remain poorly understood. In this work, we conduct the first systematic mechanistic interpretability study of LoRA within the Whisper encoder for speech emotion recognition (SER). Using a suite of analytical tools, including layer contribution probing, logit-lens inspection, and representational similarity via singular value decomposition (SVD) and centered kernel alignment (CKA), we reveal two key mechanisms: a delayed specialization process that preserves general features in early layers before consolidating task-specific information, and a forward alignment, backward differentiation dynamic between LoRA's matrices. Our findings clarify how LoRA reshapes encoder hierarchies, providing both empirical insights and a deeper mechanistic understanding for designing efficient and interpretable adaptation strategies in large speech models. Our code is available at https://github.com/harryporry77/Behind-the-Scenes.

CVMay 7, 2023
Camera-Based HRV Prediction for Remote Learning Environments

Kegang Wang, Yantao Wei, Jiankai Tang et al.

In recent years, due to the widespread use of internet videos, remote photoplethysmography (rPPG) has gained more and more attention in the fields of affective computing. Restoring blood volume pulse (BVP) signals from facial videos is a challenging task that involves a series of preprocessing, image algorithms, and postprocessing to restore waveforms. Not only is the heart rate metric utilized for affective computing, but the heart rate variability (HRV) metric is even more significant. The challenge in obtaining HRV indices through rPPG lies in the necessity for algorithms to precisely predict the BVP peak positions. In this paper, we collected the Remote Learning Affect and Physiology (RLAP) dataset, which includes over 32 hours of highly synchronized video and labels from 58 subjects. This is a public dataset whose BVP labels have been meticulously designed to better suit the training of HRV models. Using the RLAP dataset, we trained a new model called Seq-rPPG, it is a model based on one-dimensional convolution, and experimental results reveal that this structure is more suitable for handling HRV tasks, which outperformed all other baselines in HRV performance and also demonstrated significant advantages in computational efficiency.