IRAug 9, 2016

SpEnD: Linked Data SPARQL Endpoints Discovery Using Search Engines

Semih Yumusak, Erdogan Dogdu, Halife Kodaz, Andreas Kamilaris

arXiv:1608.02761v22.720 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of monitoring and discovering linked data sources for researchers and practitioners, though it is incremental as it builds on existing endpoint repositories.

The study tackled the problem of discovering linked data SPARQL endpoints on the web by proposing a metacrawling method implemented in the SpEnD system, which uses search engines to find keywords and endpoints, resulting in the discovery of most existing endpoints and many new ones.

In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a "search keyword" discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, these search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. Finally, we have developed a new SPARQL endpoint crawler (SpEC) for crawling and link analysis.

View on arXiv PDF

Similar