IRMay 17, 2017

JCTC: A Large Job posting Corpus for Text Classification

arXiv:1705.06123v23 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of inaccessible online job information for labor market analysis by related organizations, though it is incremental as it builds on existing methods for corpus construction.

The authors tackled the lack of a suitable text classification corpus for labor market analysis by introducing JCTC, a large job posting corpus with 102,581 postings across 465 categories, and benchmarked five state-of-the-art deep learning approaches to provide baseline results.

The absence of an appropriate text classification corpus makes the massive amount of online job information unusable for labor market analysis. This paper presents JCTC, a large job posting corpus for text classification. In JCTC construction framework, a formal specification issued by the Chinese central government is chosen as the classification standard. The unsupervised learning (WE-cos), supervised learning algorithm (SVM) and human judgements are all used in the construction process. JCTC has 102581 online job postings distributed in 465 categories. The method proposed here can not only ameliorate the high demands on people's skill and knowledge, but reduce the subjective influences as well. Besides, the method is not limited in Chinese. We benchmark five state-of-the-art deep learning approaches on JCTC providing baseline results for future studies. JCTC might be the first job posting corpus for text classification and the largest one in Chinese. With the help of JCTC, related organizations are able to monitor, analyze and predict the labor market in a comprehensive, accurate and timely manner.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes