Haihan Duan

h-index12

3papers

786citations

Novelty43%

AI Score39

Ranked #77,780 of 194,257 authors (top 40%)#26,320 in CV (top 45%)

3 Papers

20.1CVMar 8, 2023Code

A Light Weight Model for Active Speaker Detection

Junhua Liao, Haihan Duan, Kanghui Feng et al.

Active speaker detection is a challenging task in audio-visual scenario understanding, which aims to detect who is speaking in one or more speakers scenarios. This task has received extensive attention as it is crucial in applications such as speaker diarization, speaker tracking, and automatic video editing. The existing studies try to improve performance by inputting multiple candidate information and designing complex models. Although these methods achieved outstanding performance, their high consumption of memory and computational power make them difficult to be applied in resource-limited scenarios. Therefore, we construct a lightweight active speaker detection architecture by reducing input candidates, splitting 2D and 3D convolutions for audio-visual feature extraction, and applying gated recurrent unit (GRU) with low computational complexity for cross-modal modeling. Experimental results on the AVA-ActiveSpeaker dataset show that our framework achieves competitive mAP performance (94.1% vs. 94.2%), while the resource costs are significantly lower than the state-of-the-art method, especially in model parameters (1.0M vs. 22.5M, about 23x) and FLOPs (0.6G vs. 2.6G, about 4x). In addition, our framework also performs well on the Columbia dataset showing good robustness. The code and model weights are available at https://github.com/Junhua-Liao/Light-ASD.

4.1LGOct 31, 2025

FedSM: Robust Semantics-Guided Feature Mixup for Bias Reduction in Federated Learning with Long-Tail Data

Jingrui Zhang, Yimeng Xu, Shujie Li et al.

Federated Learning (FL) enables collaborative model training across decentralized clients without sharing private data. However, FL suffers from biased global models due to non-IID and long-tail data distributions. We propose \textbf{FedSM}, a novel client-centric framework that mitigates this bias through semantics-guided feature mixup and lightweight classifier retraining. FedSM uses a pretrained image-text-aligned model to compute category-level semantic relevance, guiding the category selection of local features to mix-up with global prototypes to generate class-consistent pseudo-features. These features correct classifier bias, especially when data are heavily skewed. To address the concern of potential domain shift between the pretrained model and the data, we propose probabilistic category selection, enhancing feature diversity to effectively mitigate biases. All computations are performed locally, requiring minimal server overhead. Extensive experiments on long-tail datasets with various imbalanced levels demonstrate that FedSM consistently outperforms state-of-the-art methods in accuracy, with high robustness to domain shift and computational efficiency.

13.0MMAug 20, 2021

Metaverse for Social Good: A University Campus Prototype

Haihan Duan, Jiaye Li, Sizheng Fan et al.

In recent years, the metaverse has attracted enormous attention from around the world with the development of related technologies. The expected metaverse should be a realistic society with more direct and physical interactions, while the concepts of race, gender, and even physical disability would be weakened, which would be highly beneficial for society. However, the development of metaverse is still in its infancy, with great potential for improvement. Regarding metaverse's huge potential, industry has already come forward with advance preparation, accompanied by feverish investment, but there are few discussions about metaverse in academia to scientifically guide its development. In this paper, we highlight the representative applications for social good. Then we propose a three-layer metaverse architecture from a macro perspective, containing infrastructure, interaction, and ecosystem. Moreover, we journey toward both a historical and novel metaverse with a detailed timeline and table of specific attributes. Lastly, we illustrate our implemented blockchain-driven metaverse prototype of a university campus and discuss the prototype design and insights.