DL AI CL IRDec 30, 2024

ACL-rlg: A Dataset for Reading List Generation

Julien Aubert-Béduchaud, Florian Boudin, Béatrice Daille, Richard Dufour

arXiv:2502.15692v113.419 citationsh-index: 24COLING

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge for researchers needing structured literature overviews, but it is incremental as it focuses on dataset creation and baseline evaluation.

The authors tackled the problem of generating reading lists for scientific fields by introducing ACL-rlg, the largest open expert-annotated dataset, and found that traditional search engines perform poorly while GPT-4o shows better results but with potential data contamination.

Familiarizing oneself with a new scientific field and its existing literature can be daunting due to the large amount of available articles. Curated lists of academic references, or reading lists, compiled by experts, offer a structured way to gain a comprehensive overview of a domain or a specific scientific challenge. In this work, we introduce ACL-rlg, the largest open expert-annotated reading list dataset. We also provide multiple baselines for evaluating reading list generation and formally define it as a retrieval task. Our qualitative study highlights the fact that traditional scholarly search engines and indexing methods perform poorly on this task, and GPT-4o, despite showing better results, exhibits signs of potential data contamination.

View on arXiv PDF

Similar