IRDec 22, 2017

Finding People's Professions and Nationalities Using Distant Supervision - The FMI@SU "goosefoot" team at the WSDM Cup 2017 Triple Scoring Task

arXiv:1712.08350v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a specific information extraction task for knowledge base construction, but it is incremental as it builds on existing methods for a competition.

The paper tackled the problem of scoring the relevance of triples for professions and nationalities using distant supervision, achieving first place in Kendall's Tau out of 21 teams in a competition.

We describe the system that our FMI@SU student's team built for participating in the Triple Scoring task at the WSDM Cup 2017. Given a triple from a "type-like" relation, profession or nationality, the goal is to produce a score, on a scale from 0 to 7, that measures the relevance of the statement expressed by the triple: e.g., how well does the profession of an Actor fit for Quentin Tarantino? We propose a distant supervision approach using information crawled from Wikipedia, DeletionPedia, and DBpedia, together with task-specific word embeddings, TF-IDF weights, and role occurrence order, which we combine in a linear regression model. The official evaluation ranked our submission 1st on Kendall's Tau, 7th on Average score difference, and 9th on Accuracy, out of 21 participating teams.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes