Qiyang Sun

LG
h-index29
7papers
86citations
Novelty31%
AI Score42

7 Papers

3.9SDApr 29
Explainable Detection of Machine Generated Music and Early Systematic Evaluation

Yupei Li, Qiyang Sun, Hanqian Li et al.

Machine-generated music (MGM) has become a groundbreaking innovation with wide-ranging applications, such as music therapy, personalised editing, and creative inspiration within the music industry. However, the unregulated proliferation of MGM presents considerable challenges to the entertainment, education, and arts sectors by potentially undermining the value of high-quality human compositions. Consequently, MGM detection (MGMD) is crucial for preserving the integrity of these fields. Despite its significance, MGMD domain lacks comprehensive systematic evaluation results necessary to drive meaningful progress. To address this gap, we conduct experiments on existing large-scale datasets using a range of foundational models for audio processing, establishing systematic evaluation results tailored to the MGMD task. Our selection includes traditional machine learning models, deep neural networks, Transformer-based architectures, and State space models (SSM). Recognising the inherently multimodal nature of music, which integrates both melody and lyrics, we also explore fundamental multimodal models in our experiments. Beyond providing basic binary classification outcomes, we delve deeper into model behaviour using multiple explainable Artificial Intelligence (XAI) tools, offering insights into their decision-making processes. Our analysis reveals that ResNet18 performs the best according to in-domain and out-of-domain tests. By providing a comprehensive comparison of systematic evaluation results and their interpretability, we propose several directions to inspire future research to develop more robust and effective detection methods for MGM. We provide our codes and some samples on Github repository.

CLMar 26, 2025Code
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations

Yupei Li, Qiyang Sun, Sunil Munthumoduku Krishna Murthy et al.

Affective Computing (AC) is essential for advancing Artificial General Intelligence (AGI), with emotion recognition serving as a key component. However, human emotions are inherently dynamic, influenced not only by an individual's expressions but also by interactions with others, and single-modality approaches often fail to capture their full dynamics. Multimodal Emotion Recognition (MER) leverages multiple signals but traditionally relies on utterance-level analysis, overlooking the dynamic nature of emotions in conversations. Emotion Recognition in Conversation (ERC) addresses this limitation, yet existing methods struggle to align multimodal features and explain why emotions evolve within dialogues. To bridge this gap, we propose GatedxLSTM, a novel speech-text multimodal ERC model that explicitly considers voice and transcripts of both the speaker and their conversational partner(s) to identify the most influential sentences driving emotional shifts. By integrating Contrastive Language-Audio Pretraining (CLAP) for improved cross-modal alignment and employing a gating mechanism to emphasise emotionally impactful utterances, GatedxLSTM enhances both interpretability and performance. Additionally, the Dialogical Emotion Decoder (DED) refines emotion predictions by modelling contextual dependencies. Experiments on the IEMOCAP dataset demonstrate that GatedxLSTM achieves state-of-the-art (SOTA) performance among open-source methods in four-class emotion classification. These results validate its effectiveness for ERC applications and provide an interpretability analysis from a psychological perspective.

LGNov 15, 2024
Explainable Artificial Intelligence for Medical Applications: A Review

Qiyang Sun, Alican Akman, Björn W. Schuller

The continuous development of artificial intelligence (AI) theory has propelled this field to unprecedented heights, owing to the relentless efforts of scholars and researchers. In the medical realm, AI takes a pivotal role, leveraging robust machine learning (ML) algorithms. AI technology in medical imaging aids physicians in X-ray, computed tomography (CT) scans, and magnetic resonance imaging (MRI) diagnoses, conducts pattern recognition and disease prediction based on acoustic data, delivers prognoses on disease types and developmental trends for patients, and employs intelligent health management wearable devices with human-computer interaction technology to name but a few. While these well-established applications have significantly assisted in medical field diagnoses, clinical decision-making, and management, collaboration between the medical and AI sectors faces an urgent challenge: How to substantiate the reliability of decision-making? The underlying issue stems from the conflict between the demand for accountability and result transparency in medical scenarios and the black-box model traits of AI. This article reviews recent research grounded in explainable artificial intelligence (XAI), with an emphasis on medical practices within the visual, audio, and multimodal perspectives. We endeavour to categorise and synthesise these practices, aiming to provide support and guidance for future researchers and healthcare professionals.

AIDec 19, 2024
Towards Friendly AI: A Comprehensive Review and New Perspectives on Human-AI Alignment

Qiyang Sun, Yupei Li, Emran Alturki et al.

As Artificial Intelligence (AI) continues to advance rapidly, Friendly AI (FAI) has been proposed to advocate for more equitable and fair development of AI. Despite its importance, there is a lack of comprehensive reviews examining FAI from an ethical perspective, as well as limited discussion on its potential applications and future directions. This paper addresses these gaps by providing a thorough review of FAI, focusing on theoretical perspectives both for and against its development, and presenting a formal definition in a clear and accessible format. Key applications are discussed from the perspectives of eXplainable AI (XAI), privacy, fairness and affective computing (AC). Additionally, the paper identifies challenges in current technological advancements and explores future research avenues. The findings emphasise the significance of developing FAI and advocate for its continued advancement to ensure ethical and beneficial AI development.

SDOct 14, 2024
Audio-based Kinship Verification Using Age Domain Conversion

Qiyang Sun, Alican Akman, Xin Jing et al.

Audio-based kinship verification (AKV) is important in many domains, such as home security monitoring, forensic identification, and social network analysis. A key challenge in the task arises from differences in age across samples from different individuals, which can be interpreted as a domain bias in a cross-domain verification task. To address this issue, we design the notion of an "age-standardised domain" wherein we utilise the optimised CycleGAN-VC3 network to perform age-audio conversion to generate the in-domain audio. The generated audio dataset is employed to extract a range of features, which are then fed into a metric learning architecture to verify kinship. Experiments are conducted on the KAN_AV audio dataset, which contains age and kinship labels. The results demonstrate that the method markedly enhances the accuracy of kinship verification, while also offering novel insights for future kinship verification research.

LGOct 3, 2025
RAxSS: Retrieval-Augmented Sparse Sampling for Explainable Variable-Length Medical Time Series Classification

Aydin Javadov, Samir Garibov, Tobias Hoesli et al.

Medical time series analysis is challenging due to data sparsity, noise, and highly variable recording lengths. Prior work has shown that stochastic sparse sampling effectively handles variable-length signals, while retrieval-augmented approaches improve explainability and robustness to noise and weak temporal correlations. In this study, we generalize the stochastic sparse sampling framework for retrieval-informed classification. Specifically, we weight window predictions by within-channel similarity and aggregate them in probability space, yielding convex series-level scores and an explicit evidence trail for explainability. Our method achieves competitive iEEG classification performance and provides practitioners with greater transparency and explainability. We evaluate our method in iEEG recordings collected in four medical centers, demonstrating its potential for reliable and explainable clinical variable-length time series classification.

IVJul 12, 2025
Automatic Contouring of Spinal Vertebrae on X-Ray using a Novel Sandwich U-Net Architecture

Sunil Munthumoduku Krishna Murthy, Kumar Rajamani, Srividya Tirunellai Rajamani et al.

In spinal vertebral mobility disease, accurately extracting and contouring vertebrae is essential for assessing mobility impairments and monitoring variations during flexion-extension movements. Precise vertebral contouring plays a crucial role in surgical planning; however, this process is traditionally performed manually by radiologists or surgeons, making it labour-intensive, time-consuming, and prone to human error. In particular, mobility disease analysis requires the individual contouring of each vertebra, which is both tedious and susceptible to inconsistencies. Automated methods provide a more efficient alternative, enabling vertebra identification, segmentation, and contouring with greater accuracy and reduced time consumption. In this study, we propose a novel U-Net variation designed to accurately segment thoracic vertebrae from anteroposterior view on X-Ray images. Our proposed approach, incorporating a ``sandwich" U-Net structure with dual activation functions, achieves a 4.1\% improvement in Dice score compared to the baseline U-Net model, enhancing segmentation accuracy while ensuring reliable vertebral contour extraction.