IRAug 9, 2016

SpEnD: Linked Data SPARQL Endpoints Discovery Using Search Engines

arXiv:1608.02761v220 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of monitoring and discovering linked data sources for researchers and practitioners, though it is incremental as it builds on existing endpoint repositories.

The study tackled the problem of discovering linked data SPARQL endpoints on the web by proposing a metacrawling method implemented in the SpEnD system, which uses search engines to find keywords and endpoints, resulting in the discovery of most existing endpoints and many new ones.

In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a "search keyword" discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, these search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. Finally, we have developed a new SPARQL endpoint crawler (SpEC) for crawling and link analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes