DBAIIRMay 28, 2025

ChatPD: An LLM-driven Paper-Dataset Networking System

arXiv:2505.22349v18 citationsh-index: 14Has CodeKDD
Originality Incremental advance
AI Analysis

This system addresses the bottleneck of dataset discovery for researchers, though it is incremental as it builds on existing platforms like PapersWithCode.

The authors tackled the inefficiency of manual dataset management in academic platforms by developing ChatPD, an LLM-driven system that automates dataset information extraction from papers and constructs a paper-dataset network, achieving about 90% precision and recall in entity resolution tasks.

Scientific research heavily depends on suitable datasets for method validation, but existing academic platforms with dataset management like PapersWithCode suffer from inefficiencies in their manual workflow. To overcome this bottleneck, we present a system, called ChatPD, that utilizes Large Language Models (LLMs) to automate dataset information extraction from academic papers and construct a structured paper-dataset network. Our system consists of three key modules: \textit{paper collection}, \textit{dataset information extraction}, and \textit{dataset entity resolution} to construct paper-dataset networks. Specifically, we propose a \textit{Graph Completion and Inference} strategy to map dataset descriptions to their corresponding entities. Through extensive experiments, we demonstrate that ChatPD not only outperforms the existing platform PapersWithCode in dataset usage extraction but also achieves about 90\% precision and recall in entity resolution tasks. Moreover, we have deployed ChatPD to continuously extract which datasets are used in papers, and provide a dataset discovery service, such as task-specific dataset queries and similar dataset recommendations. We open source ChatPD and the current paper-dataset network on this [GitHub repository]{https://github.com/ChatPD-web/ChatPD}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes