Generating Categories for Sets of Entities
This work aids knowledge editors in manually expanding category systems, which are used in information access tasks, but it is incremental as it builds on existing summarization and ranking methods.
The paper tackles the problem of expanding category systems in knowledge bases by generating categories for sets of entities, using neural abstractive summarization and ranking features to identify promising candidates, with effectiveness demonstrated on a Wikipedia-based test collection.
Category systems are central components of knowledge bases, as they provide a hierarchical grouping of semantically related concepts and entities. They are a unique and valuable resource that is utilized in a broad range of information access tasks. To aid knowledge editors in the manual process of expanding a category system, this paper presents a method of generating categories for sets of entities. First, we employ neural abstractive summarization models to generate candidate categories. Next, the location within the hierarchy is identified for each candidate. Finally, structure-, content-, and hierarchy-based features are used to rank candidates to identify by the most promising ones (measured in terms of specificity, hierarchy, and importance). We develop a test collection based on Wikipedia categories and demonstrate the effectiveness of the proposed approach.