DLDBMay 19

One in Eight OpenAlex Abstracts Has Integrity Issues

arXiv:2605.2016883.9
AI Analysis

For computational metascience researchers relying on OpenAlex, this reveals a substantial data quality problem that could affect downstream analyses.

12% of OpenAlex abstracts have integrity issues, primarily insufficient content and misplaced metadata, based on a two-stage annotation of 10,000 samples.

Scientific abstracts are increasingly used as primary data in computational metascience research, yet the quality of these abstracts in widely used bibliographic databases has not been systematically examined. We assess the integrity of 10,000 randomly sampled English-language journal abstracts from OpenAlex using a two-stage annotation protocol combining human expert review and large language model classification. We identify seven distinct failure modes and find that 12\% of abstracts have integrity issues, with insufficient content and misplaced metadata being the most prevalent. We discuss implications for downstream research and describe a forthcoming community portal to support collective annotation efforts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes