Rui Sang

SD
h-index3
5papers
7citations
Novelty51%
AI Score42

5 Papers

SDSep 5, 2025
MAIA: An Inpainting-Based Approach for Music Adversarial Attacks

Yuxuan Liu, Peihong Zhang, Rui Sang et al.

Music adversarial attacks have garnered significant interest in the field of Music Information Retrieval (MIR). In this paper, we present Music Adversarial Inpainting Attack (MAIA), a novel adversarial attack framework that supports both white-box and black-box attack scenarios. MAIA begins with an importance analysis to identify critical audio segments, which are then targeted for modification. Utilizing generative inpainting models, these segments are reconstructed with guidance from the output of the attacked model, ensuring subtle and effective adversarial perturbations. We evaluate MAIA on multiple MIR tasks, demonstrating high attack success rates in both white-box and black-box settings while maintaining minimal perceptual distortion. Additionally, subjective listening tests confirm the high audio fidelity of the adversarial samples. Our findings highlight vulnerabilities in current MIR systems and emphasize the need for more robust and secure models.

SDSep 14, 2025
An Entropy-Guided Curriculum Learning Strategy for Data-Efficient Acoustic Scene Classification under Domain Shift

Peihong Zhang, Yuxuan Liu, Zhixin Li et al.

Acoustic Scene Classification (ASC) faces challenges in generalizing across recording devices, particularly when labeled data is limited. The DCASE 2024 Challenge Task 1 highlights this issue by requiring models to learn from small labeled subsets recorded on a few devices. These models need to then generalize to recordings from previously unseen devices under strict complexity constraints. While techniques such as data augmentation and the use of pre-trained models are well-established for improving model generalization, optimizing the training strategy represents a complementary yet less-explored path that introduces no additional architectural complexity or inference overhead. Among various training strategies, curriculum learning offers a promising paradigm by structuring the learning process from easier to harder examples. In this work, we propose an entropy-guided curriculum learning strategy to address the domain shift problem in data-efficient ASC. Specifically, we quantify the uncertainty of device domain predictions for each training sample by computing the Shannon entropy of the device posterior probabilities estimated by an auxiliary domain classifier. Using entropy as a proxy for domain invariance, the curriculum begins with high-entropy samples and gradually incorporates low-entropy, domain-specific ones to facilitate the learning of generalizable representations. Experimental results on multiple DCASE 2024 ASC baselines demonstrate that our strategy effectively mitigates domain shift, particularly under limited labeled data conditions. Our strategy is architecture-agnostic and introduces no additional inference cost, making it easily integrable into existing ASC baselines and offering a practical solution to domain shift.

SPMay 14, 2025
NMCSE: Noise-Robust Multi-Modal Coupling Signal Estimation Method via Optimal Transport for Cardiovascular Disease Detection

Peihong Zhang, Zhixin Li, Rui Sang et al.

The coupling signal refers to a latent physiological signal that characterizes the transformation from cardiac electrical excitation, captured by the electrocardiogram (ECG), to mechanical contraction, recorded by the phonocardiogram (PCG). By encoding the temporal and functional interplay between electrophysiological and hemodynamic events, it serves as an intrinsic link between modalities and offers a unified representation of cardiac function, with strong potential to enhance multi-modal cardiovascular disease (CVD) detection. However, existing coupling signal estimation methods remain highly vulnerable to noise, particularly in real-world clinical and physiological settings, which undermines their robustness and limits practical value. In this study, we propose Noise-Robust Multi-Modal Coupling Signal Estimation (NMCSE), which reformulates coupling signal estimation as a distribution matching problem solved via optimal transport. By jointly aligning amplitude and timing, NMCSE avoids noise amplification and enables stable signal estimation. When integrated into a Temporal-Spatial Feature Extraction (TSFE) network, the estimated coupling signal effectively enhances multi-modal fusion for more accurate CVD detection. To evaluate robustness under real-world conditions, we design two complementary experiments targeting distinct sources of noise. The first uses the PhysioNet 2016 dataset with simulated hospital noise to assess the resilience of NMCSE to clinical interference. The second leverages the EPHNOGRAM dataset with motion-induced physiological noise to evaluate intra-state estimation stability across activity levels. Experimental results show that NMCSE consistently outperforms existing methods under both clinical and physiological noise, highlighting it as a noise-robust estimation approach that enables reliable multi-modal cardiac detection in real-world conditions.

SDOct 20, 2025
TopSeg: A Multi-Scale Topological Framework for Data-Efficient Heart Sound Segmentation

Peihong Zhang, Zhixin Li, Yuxuan Liu et al.

Deep learning approaches for heart-sound (PCG) segmentation built on time--frequency features can be accurate but often rely on large expert-labeled datasets, limiting robustness and deployment. We present TopSeg, a topological representation-centric framework that encodes PCG dynamics with multi-scale topological features and decodes them using a lightweight temporal convolutional network (TCN) with an order- and duration-constrained inference step. To evaluate data efficiency and generalization, we train exclusively on PhysioNet 2016 dataset with subject-level subsampling and perform external validation on CirCor dataset. Under matched-capacity decoders, the topological features consistently outperform spectrogram and envelope inputs, with the largest margins at low data budgets; as a full system, TopSeg surpasses representative end-to-end baselines trained on their native inputs under the same budgets while remaining competitive at full data. Ablations at 10% training confirm that all scales contribute and that combining H_0 and H_1 yields more reliable S1/S2 localization and boundary stability. These results indicate that topology-aware representations provide a strong inductive bias for data-efficient, cross-dataset PCG segmentation, supporting practical use when labeled data are limited.

SDOct 20, 2025
DDSC: Dynamic Dual-Signal Curriculum for Data-Efficient Acoustic Scene Classification under Domain Shift

Peihong Zhang, Yuxuan Liu, Rui Sang et al.

Acoustic scene classification (ASC) suffers from device-induced domain shift, especially when labels are limited. Prior work focuses on curriculum-based training schedules that structure data presentation by ordering or reweighting training examples from easy-to-hard to facilitate learning; however, existing curricula are static, fixing the ordering or the weights before training and ignoring that example difficulty and marginal utility evolve with the learned representation. To overcome this limitation, we propose the Dynamic Dual-Signal Curriculum (DDSC), a training schedule that adapts the curriculum online by combining two signals computed each epoch: a domain-invariance signal and a learning-progress signal. A time-varying scheduler fuses these signals into per-example weights that prioritize domain-invariant examples in early epochs and progressively emphasize device-specific cases. DDSC is lightweight, architecture-agnostic, and introduces no additional inference overhead. Under the official DCASE 2024 Task~1 protocol, DDSC consistently improves cross-device performance across diverse ASC baselines and label budgets, with the largest gains on unseen-device splits.