AIAug 19, 2025

MHSNet:An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model

arXiv:2508.13676v2h-index: 44CIKM
Originality Incremental advance
AI Analysis

This addresses the problem of maintaining a high-quality talent pool for recruiters by detecting duplicates in incomplete and heterogeneous resumes, though it appears incremental as it builds on existing models like BGE-M3.

The paper tackles duplicate resume detection by proposing MHSNet, a framework that fine-tunes BGE-M3 with contrastive learning and uses Mixture-of-Experts to generate multi-level semantic representations, achieving improved accuracy in experiments.

To maintain the company's talent pool, recruiters need to continuously search for resumes from third-party websites (e.g., LinkedIn, Indeed). However, fetched resumes are often incomplete and inaccurate. To improve the quality of third-party resumes and enrich the company's talent pool, it is essential to conduct duplication detection between the fetched resumes and those already in the company's talent pool. Such duplication detection is challenging due to the semantic complexity, structural heterogeneity, and information incompleteness of resume texts. To this end, we propose MHSNet, an multi-level identity verification framework that fine-tunes BGE-M3 using contrastive learning. With the fine-tuned , Mixture-of-Experts (MoE) generates multi-level sparse and dense representations for resumes, enabling the computation of corresponding multi-level semantic similarities. Moreover, the state-aware Mixture-of-Experts (MoE) is employed in MHSNet to handle diverse incomplete resumes. Experimental results verify the effectiveness of MHSNet

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes