CLApr 30, 2025

TartuNLP at SemEval-2025 Task 5: Subject Tagging as Two-Stage Information Retrieval

arXiv:2504.21547v12 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the specific problem of aiding librarians in tagging library records, representing an incremental improvement over existing methods.

The paper tackled the problem of assigning subject tags to library records by framing it as a two-stage information retrieval task, using a bi-encoder for candidate extraction and a cross-encoder for re-ranking, which improved recall compared to single-stage methods and showed competitive results in qualitative evaluation.

We present our submission to the Task 5 of SemEval-2025 that aims to aid librarians in assigning subject tags to the library records by producing a list of likely relevant tags for a given document. We frame the task as an information retrieval problem, where the document content is used to retrieve subject tags from a large subject taxonomy. We leverage two types of encoder models to build a two-stage information retrieval system -- a bi-encoder for coarse-grained candidate extraction at the first stage, and a cross-encoder for fine-grained re-ranking at the second stage. This approach proved effective, demonstrating significant improvements in recall compared to single-stage methods and showing competitive results according to qualitative evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes