Ranking Triples using Entity Links in a Large Web Crawl - The Chicory Triple Scorer at WSDM Cup 2017
This work addresses the triple ranking problem for information retrieval and knowledge base construction, but it is incremental as it builds on existing entity linking data and baseline methods.
The paper tackled the problem of ranking triples for correctness in the WSDM Cup 2017 challenge by using entity links from a large web crawl (ClueWeb12 with FACC1 dataset) combined with a baseline from Wikipedia abstracts, resulting in an automatically generated implementation from a declarative search strategy.
This paper describes the participation of team Chicory in the Triple Ranking Challenge of the WSDM Cup 2017. Our approach deploys a large collection of entity tagged web data to estimate the correctness of the relevance relation expressed by the triples, in combination with a baseline approach using Wikipedia abstracts following [1]. Relevance estimations are drawn from ClueWeb12 annotated by Google's entity linker, available publicly as the FACC1 dataset. Our implementation is automatically generated from a so-called 'search strategy' that specifies declaratively how the input data are combined into a final ranking of triples.