Fast End-to-End Wikification
This work addresses the need for efficient Wikification in NLP applications, though it is incremental as it builds on existing redirect-based approaches with a focus on speed.
The paper tackles the problem of slow Wikification for large corpora by introducing RedW, a run-time oriented solution based on Wikipedia redirects, which achieves competitive performance and includes an efficient confidence estimation method to enable more demanding methods on lower-confidence results.
Wikification of large corpora is beneficial for various NLP applications. Existing methods focus on quality performance rather than run-time, and are therefore non-feasible for large data. Here, we introduce RedW, a run-time oriented Wikification solution, based on Wikipedia redirects, that can Wikify massive corpora with competitive performance. We further propose an efficient method for estimating RedW confidence, opening the door for applying more demanding methods only on top of RedW lower-confidence results. Our experimental results support the validity of the proposed approach.