An Efficient Long-Context Ranking Architecture With Calibrated LLM Distillation: Application to Person-Job Fit

Warren Jouanneau, Emma Jouffroy, Marc Palyart

arXiv:2601.10321v21.11 citationsh-index: 11

Originality Incremental advance

AI Analysis

This addresses the challenge of real-time person-job fit for recruiters and job platforms, though it appears incremental as it builds on existing re-ranking and distillation techniques.

The paper tackled the problem of efficiently matching long, structured, and multilingual resumes to job proposals by proposing a re-ranking model with a late cross-attention architecture and LLM distillation, resulting in improved performance on relevance, ranking, and calibration metrics compared to state-of-the-art baselines.

Finding the most relevant person for a job proposal in real time is challenging, especially when resumes are long, structured, and multilingual. In this paper, we propose a re-ranking model based on a new generation of late cross-attention architecture, that decomposes both resumes and project briefs to efficiently handle long-context inputs with minimal computational overhead. To mitigate historical data biases, we use a generative large language model (LLM) as a teacher, generating fine-grained, semantically grounded supervision. This signal is distilled into our student model via an enriched distillation loss function. The resulting model produces skill-fit scores that enable consistent and interpretable person-job matching. Experiments on relevance, ranking, and calibration metrics demonstrate that our approach outperforms state-of-the-art baselines.

View on arXiv PDF

Similar