CLMay 3, 2022

Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning

arXiv:2205.01381v1587 citationsh-index: 46Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for automated skill extraction in Danish job markets, but it is incremental as it applies existing methods to a new language and dataset.

The authors tackled the problem of fine-grained skill classification in Danish job postings by creating the first Danish dataset and using distant supervision from the ESCO taxonomy. Their results showed that the RemBERT model significantly outperformed other models in both zero-shot and few-shot settings.

Skill Classification (SC) is the task of classifying job competences from job postings. This work is the first in SC applied to Danish job vacancy data. We release the first Danish job posting dataset: Kompetencer (en: competences), annotated for nested spans of competences. To improve upon coarse-grained annotations, we make use of The European Skills, Competences, Qualifications and Occupations (ESCO; le Vrang et al., 2014) taxonomy API to obtain fine-grained labels via distant supervision. We study two setups: The zero-shot and few-shot classification setting. We fine-tune English-based models and RemBERT (Chung et al., 2020) and compare them to in-language Danish models. Our results show RemBERT significantly outperforms all other models in both the zero-shot and the few-shot setting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes