DLCLMar 27, 2023

CoCon: A Data Set on Combined Contextualized Research Artifact Use

arXiv:2303.15193v12 citationsh-index: 19Has Code
Originality Synthesis-oriented
AI Analysis

This dataset addresses the need for more comprehensive tools to help researchers manage information overload in academia, though it is incremental as it builds on existing work by adding contextualized artifact use.

The authors tackled the problem of limited granularity in academic research artifact analysis by creating CoCon, a large scholarly dataset with 35,000 artifacts and 340,000 publications to enable holistic systems for search and recommendation.

In the wake of information overload in academia, methodologies and systems for search, recommendation, and prediction to aid researchers in identifying relevant research are actively studied and developed. Existing work, however, is limited in terms of granularity, focusing only on the level of papers or a single type of artifact, such as data sets. To enable more holistic analyses and systems dealing with academic publications and their content, we propose CoCon, a large scholarly data set reflecting the combined use of research artifacts, contextualized in academic publications' full-text. Our data set comprises 35 k artifacts (data sets, methods, models, and tasks) and 340 k publications. We additionally formalize a link prediction task for "combined research artifact use prediction" and provide code to utilize analyses of and the development of ML applications on our data. All data and code is publicly available at https://github.com/IllDepence/contextgraph.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes