SIDLApr 17

Making the complete OpenAIRE citation graph easily accessible through compact data representation

arXiv:2602.1220610.1h-index: 3
Predicted impact top 70% in SI · last 90 daysOriginality Synthesis-oriented
AI Analysis

This makes a large citation network accessible to researchers with limited computational resources, enabling broader community use.

The authors compressed the OpenAIRE citation graph (200M+ publications, 2B+ citations) from ~TB to 32GB while preserving full graph structure, and provided a simple format and Python pipeline for easy access and processing.

The OpenAIRE graph contains a large citation graph dataset, with over 200 million publications and over 2 billion citations. The current graph is available as a dump with metadata which uncompressed totals ~TB. This makes it hard to process on conventional computers. To make this network more available for the community we provide a processed OpenAIRE graph which is downscaled to 32GB, while preserving the full graph structure. Apart from this we offer the processed data in very simple format, which allows further straightforward manipulation. We also provide a python pipeline, which can be used to process the next releases of the OpenAIRE graph.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes