Entity Extraction with Knowledge from Web Scale Corpora
This work addresses entity extraction for text mining and NLP applications, but it is incremental as it builds upon existing techniques.
The paper tackles the problem of entity extraction by introducing post-processing techniques that use models trained on web-scale corpora to improve the effectiveness of existing dictionary-based methods, resulting in notable improvements in efficiency and effectiveness.
Entity extraction is an important task in text mining and natural language processing. A popular method for entity extraction is by comparing substrings from free text against a dictionary of entities. In this paper, we present several techniques as a post-processing step for improving the effectiveness of the existing entity extraction technique. These techniques utilise models trained with the web-scale corpora which makes our techniques robust and versatile. Experiments show that our techniques bring a notable improvement on efficiency and effectiveness.