Fast Top-k Area Topics Extraction with Knowledge Base
This work addresses the need for efficient topic extraction in research areas, offering a practical tool for researchers and analysts, though it is incremental as it builds on existing knowledge base and optimization techniques.
The paper tackles the problem of extracting the top-k most representative research topics for a given area, such as AI, by formulating it as an NP-hard optimization problem and proposing FastKATE, a model that combines explicit and latent topic representations using a knowledge base. Experimental results on three real-world datasets show the model is effective, robust, real-time (results in <1s), and superior to alternative methods.
What are the most popular research topics in Artificial Intelligence (AI)? We formulate the problem as extracting top-$k$ topics that can best represent a given area with the help of knowledge base. We theoretically prove that the problem is NP-hard and propose an optimization model, FastKATE, to address this problem by combining both explicit and latent representations for each topic. We leverage a large-scale knowledge base (Wikipedia) to generate topic embeddings using neural networks and use this kind of representations to help capture the representativeness of topics for given areas. We develop a fast heuristic algorithm to efficiently solve the problem with a provable error bound. We evaluate the proposed model on three real-world datasets. Experimental results demonstrate our model's effectiveness, robustness, real-timeness (return results in $<1$s), and its superiority over several alternative methods.