Yaodan Zhang

h-index15
2papers

2 Papers

SDNov 12, 2024
SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model

Xinyuan Qian, Jiaran Gao, Yaodan Zhang et al.

Speech enhancement plays an essential role in various applications, and the integration of visual information has been demonstrated to bring substantial advantages. However, the majority of current research concentrates on the examination of facial and lip movements, which can be compromised or entirely inaccessible in scenarios where occlusions occur or when the camera view is distant. Whereas contextual visual cues from the surrounding environment have been overlooked: for example, when we see a dog bark, our brain has the innate ability to discern and filter out the barking noise. To this end, in this paper, we introduce a novel task, i.e. SAV-SE. To our best knowledge, this is the first proposal to use rich contextual information from synchronized video as auxiliary cues to indicate the type of noise, which eventually improves the speech enhancement performance. Specifically, we propose the VC-S$^2$E method, which incorporates the Conformer and Mamba modules for their complementary strengths. Extensive experiments are conducted on public MUSIC, AVSpeech and AudioSet datasets, where the results demonstrate the superiority of VC-S$^2$E over other competitive methods. We will make the source code publicly available. Project demo page: https://AVSEPage.github.io/

IRMar 5, 2024
A Distance Metric Learning Model Based On Variational Information Bottleneck

YaoDan Zhang, Zidong Wang, Ru Jia et al.

In recent years, personalized recommendation technology has flourished and become one of the hot research directions. The matrix factorization model and the metric learning model which proposed successively have been widely studied and applied. The latter uses the Euclidean distance instead of the dot product used by the former to measure the latent space vector. While avoiding the shortcomings of the dot product, the assumption of Euclidean distance is neglected, resulting in limited recommendation quality of the model. In order to solve this problem, this paper combines the Variationl Information Bottleneck with metric learning model for the first time, and proposes a new metric learning model VIB-DML (Variational Information Bottleneck Distance Metric Learning) for rating prediction, which limits the mutual information of the latent space feature vector to improve the robustness of the model and satisfiy the assumption of Euclidean distance by decoupling the latent space feature vector. In this paper, the experimental results are compared with the root mean square error (RMSE) on the three public datasets. The results show that the generalization ability of VIB-DML is excellent. Compared with the general metric learning model MetricF, the prediction error is reduced by 7.29%. Finally, the paper proves the strong robustness of VIBDML through experiments.