CLSIJan 30, 2019

Twitter Job/Employment Corpus: A Dataset of Job-Related Discourse Built with Humans in the Loop

arXiv:1901.10619v1
Originality Synthesis-oriented
AI Analysis

This provides a new benchmark dataset for job-related topic extraction and analysis, benefiting multiple research communities.

The authors created the Twitter Job/Employment Corpus, a dataset of job-related tweets annotated using a humans-in-the-loop supervised learning framework, addressing the lack of independent, open-domain data for job-related discourse that previous studies using employer-hosted workplace social media suffered from.

We present the Twitter Job/Employment Corpus, a collection of tweets annotated by a humans-in-the-loop supervised learning framework that integrates crowdsourcing contributions and expertise on the local community and employment environment. Previous computational studies of job-related phenomena have used corpora collected from workplace social media that are hosted internally by the employers, and so lacks independence from latent job-related coercion and the broader context that an open domain, general-purpose medium such as Twitter provides. Our new corpus promises to be a benchmark for the extraction of job-related topics and advanced analysis and modeling, and can potentially benefit a wide range of research communities in the future.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes