CLSep 26, 2025

ArabJobs: A Multinational Corpus of Arabic Job Ads

arXiv:2509.22589v13 citationsHas CodeProceedings of The Third Arabic Natural Language Processing Conference
Originality Synthesis-oriented
AI Analysis

This provides a resource for fairness-aware Arabic NLP and labour market research, though it is incremental as it applies existing methods to new data.

The researchers tackled the lack of diverse Arabic job ad datasets by creating ArabJobs, a corpus of over 8,500 postings from four countries, enabling analyses of gender representation and dialectal variation, and demonstrating applications like salary estimation with large language models.

ArabJobs is a publicly available corpus of Arabic job advertisements collected from Egypt, Jordan, Saudi Arabia, and the United Arab Emirates. Comprising over 8,500 postings and more than 550,000 words, the dataset captures linguistic, regional, and socio-economic variation in the Arab labour market. We present analyses of gender representation and occupational structure, and highlight dialectal variation across ads, which offers opportunities for future research. We also demonstrate applications such as salary estimation and job category normalisation using large language models, alongside benchmark tasks for gender bias detection and profession classification. The findings show the utility of ArabJobs for fairness-aware Arabic NLP and labour market research. The dataset is publicly available on GitHub: https://github.com/drelhaj/ArabJobs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes