A Unified Taxonomy-Guided Instruction Tuning Framework for Entity Set Expansion and Taxonomy Expansion
This work addresses the need for more generalizable and holistic methods in taxonomy population tasks, which is incremental as it builds on existing techniques but integrates them into a unified approach.
The paper tackles the problem of automatically populating taxonomies with emerging concepts by unifying entity set expansion, taxonomy expansion, and seed-guided taxonomy construction into a single framework, achieving state-of-the-art performance across multiple benchmark datasets.
Entity set expansion, taxonomy expansion, and seed-guided taxonomy construction are three representative tasks that can be applied to automatically populate an existing taxonomy with emerging concepts. Previous studies view them as three separate tasks. Therefore, their proposed techniques usually work for one specific task only, lacking generalizability and a holistic perspective. In this paper, we aim at a unified solution to the three tasks. To be specific, we identify two common skills needed for entity set expansion, taxonomy expansion, and seed-guided taxonomy construction: finding "siblings" and finding "parents". We propose a taxonomy-guided instruction tuning framework to teach a large language model to generate siblings and parents for query entities, where the joint pre-training process facilitates the mutual enhancement of the two skills. Extensive experiments on multiple benchmark datasets demonstrate the efficacy of our proposed TaxoInstruct framework, which outperforms task-specific baselines across all three tasks.