CLLGApr 11, 2022

"FIJO": a French Insurance Soft Skill Detection Dataset

arXiv:2204.05208v210 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This provides a public dataset for researchers and practitioners in NLP and labor market analysis, though it is incremental as it focuses on a specific domain.

The authors tackled the lack of publicly available annotated data for skill detection in job ads by introducing FIJO, a French insurance job offer dataset with soft skill annotations, and demonstrated that transformer-based models achieve good token-wise performance on it.

Understanding the evolution of job requirements is becoming more important for workers, companies and public organizations to follow the fast transformation of the employment market. Fortunately, recent natural language processing (NLP) approaches allow for the development of methods to automatically extract information from job ads and recognize skills more precisely. However, these efficient approaches need a large amount of annotated data from the studied domain which is difficult to access, mainly due to intellectual property. This article proposes a new public dataset, FIJO, containing insurance job offers, including many soft skill annotations. To understand the potential of this dataset, we detail some characteristics and some limitations. Then, we present the results of skill detection algorithms using a named entity recognition approach and show that transformers-based models have good token-wise performances on this dataset. Lastly, we analyze some errors made by our best model to emphasize the difficulties that may arise when applying NLP approaches.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes