ConFit: Improving Resume-Job Matching using Data Augmentation and Contrastive Learning
This addresses the problem of sparse interaction data for companies and job seekers in recruitment systems, though it is incremental as it builds on existing contrastive learning and augmentation techniques.
The paper tackled the sparsity problem in resume-job matching datasets by using data augmentation and contrastive learning, resulting in performance improvements of up to 19% and 31% absolute in nDCG@10 for ranking jobs and resumes, respectively.
A reliable resume-job matching system helps a company find suitable candidates from a pool of resumes, and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction records in resume-job datasets are sparse. Different from many prior work that use complex modeling techniques, we tackle this sparsity problem using data augmentations and a simple contrastive learning approach. ConFit first creates an augmented resume-job dataset by paraphrasing specific sections in a resume or a job post. Then, ConFit uses contrastive learning to further increase training samples from $B$ pairs per batch to $O(B^2)$ per batch. We evaluate ConFit on two real-world datasets and find it outperforms prior methods (including BM25 and OpenAI text-ada-002) by up to 19% and 31% absolute in nDCG@10 for ranking jobs and ranking resumes, respectively.