Inflo: News Categorization and Keyphrase Extraction for Implementation in an Aggregation System
This work addresses the problem of slow content discovery for users of news aggregation systems, but it is incremental as it applies existing methods to a new dataset.
The authors tackled news categorization and keyphrase extraction to speed up content discovery in aggregation platforms, achieving results by training a neural network on 500,000 articles across 12 categories and using multiple methods for keyphrase extraction.
The work herein describes a system for automatic news category and keyphrase labeling, presented in the context of our motivation to improve the speed at which a user can find relevant and interesting content within an aggregation platform. A set of 12 discrete categories were applied to over 500,000 news articles for training a neural network, to be used to facilitate the more in-depth task of extracting the most significant keyphrases. The latter was done using three methods: statistical, graphical and numerical, using the pre-identified category label to improve relevance of extracted phrases. The results are presented in a demo in which the articles are pre-populated via News API, and upon being selected, the category and keyphrase labels will be computed via the methods explained herein.