CLMay 20, 2023

ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain

arXiv:2305.12092v1226 citations
Originality Incremental advance
AI Analysis

This addresses the problem of handling job-related NLP tasks in multiple languages for the computational job market domain, representing an incremental improvement with domain-specific adaptation.

The study tackled the lack of generalized multilingual models for job market NLP tasks by introducing ESCOXLM-R, a model pre-trained on the ESCO taxonomy across 27 languages, which achieved state-of-the-art results on 6 out of 9 datasets in tasks like skill extraction and classification.

The increasing number of benchmarks for Natural Language Processing (NLP) tasks in the computational job market domain highlights the demand for methods that can handle job-related tasks such as skill extraction, skill classification, job title classification, and de-identification. While some approaches have been developed that are specific to the job market domain, there is a lack of generalized, multilingual models and benchmarks for these tasks. In this study, we introduce a language model called ESCOXLM-R, based on XLM-R, which uses domain-adaptive pre-training on the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy, covering 27 languages. The pre-training objectives for ESCOXLM-R include dynamic masked language modeling and a novel additional objective for inducing multilingual taxonomical ESCO relations. We comprehensively evaluate the performance of ESCOXLM-R on 6 sequence labeling and 3 classification tasks in 4 languages and find that it achieves state-of-the-art results on 6 out of 9 datasets. Our analysis reveals that ESCOXLM-R performs better on short spans and outperforms XLM-R on entity-level and surface-level span-F1, likely due to ESCO containing short skill and occupation titles, and encoding information on the entity-level.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes