CLAIIRSep 13, 2023

Résumé Parsing as Hierarchical Sequence Labeling: An Empirical Study

arXiv:2309.07015v15 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses the problem of automated resume parsing for HR and recruitment systems, presenting an incremental improvement with new multilingual datasets.

The paper tackles resume information extraction by reformulating it as a two-level hierarchical sequence labeling problem, achieving improved performance over previous methods across multiple languages.

Extracting information from résumés is typically formulated as a two-stage problem, where the document is first segmented into sections and then each section is processed individually to extract the target entities. Instead, we cast the whole problem as sequence labeling in two levels -- lines and tokens -- and study model architectures for solving both tasks simultaneously. We build high-quality résumé parsing corpora in English, French, Chinese, Spanish, German, Portuguese, and Swedish. Based on these corpora, we present experimental results that demonstrate the effectiveness of the proposed models for the information extraction task, outperforming approaches introduced in previous work. We conduct an ablation study of the proposed architectures. We also analyze both model performance and resource efficiency, and describe the trade-offs for model deployment in the context of a production environment.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes