IRAug 6, 2014
Unstable markup: A template-based information extraction from web sites with unstable markupMaxim Kolchin, Fedor Kozlov
This paper presents results of a work on crawling CEUR Workshop proceedings web site to a Linked Open Data (LOD) dataset in the framework of ESWC 2014 Semantic Publishing Challenge 2014. Our approach is based on using an extensible template-dependent crawler and DBpedia for linking extracted entities, such as the names of universities and countries.