Enriching Taxonomies Using Large Language Models
This addresses the issue of ineffective knowledge retrieval due to poor taxonomy quality, but it is incremental as it builds on existing LLM methods for a specific application.
The paper tackles the problem of limited coverage and outdated nodes in taxonomies by introducing Taxoria, a pipeline that uses Large Language Models to propose and validate candidate nodes for enrichment, resulting in an enhanced taxonomy with provenance tracking and visualization.
Taxonomies play a vital role in structuring and categorizing information across domains. However, many existing taxonomies suffer from limited coverage and outdated or ambiguous nodes, reducing their effectiveness in knowledge retrieval. To address this, we present Taxoria, a novel taxonomy enrichment pipeline that leverages Large Language Models (LLMs) to enhance a given taxonomy. Unlike approaches that extract internal LLM taxonomies, Taxoria uses an existing taxonomy as a seed and prompts an LLM to propose candidate nodes for enrichment. These candidates are then validated to mitigate hallucinations and ensure semantic relevance before integration. The final output includes an enriched taxonomy with provenance tracking and visualization of the final merged taxonomy for analysis.