Making the complete OpenAIRE citation graph easily accessible through compact data representation
This makes a large citation network accessible to researchers with limited computational resources, enabling broader community use.
The authors compressed the OpenAIRE citation graph (200M+ publications, 2B+ citations) from ~TB to 32GB while preserving full graph structure, and provided a simple format and Python pipeline for easy access and processing.
The OpenAIRE graph contains a large citation graph dataset, with over 200 million publications and over 2 billion citations. The current graph is available as a dump with metadata which uncompressed totals ~TB. This makes it hard to process on conventional computers. To make this network more available for the community we provide a processed OpenAIRE graph which is downscaled to 32GB, while preserving the full graph structure. Apart from this we offer the processed data in very simple format, which allows further straightforward manipulation. We also provide a python pipeline, which can be used to process the next releases of the OpenAIRE graph.