AIJun 7, 2020

An Empirical Meta-analysis of the Life Sciences (Linked?) Open Data on the Web

arXiv:2006.04161v11 citations
Originality Synthesis-oriented
AI Analysis

This addresses logistical challenges for biomedical researchers in data integration, but it is incremental as it builds on existing linked data technologies.

The paper tackled the problem of semantic heterogeneity in biomedical linked open data by analyzing over 80 sources, finding that many are stand-alone with unpublished schemas and limited usefulness for integration.

While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 publicly available biomedical linked data graphs into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from a biomedical perspective. We envision that the LSLOD schema graph and the findings from this research will aid researchers who wish to query and integrate data and knowledge from multiple biomedical sources simultaneously on the Web.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes