SE AIAug 29, 2025

LLM-based Triplet Extraction for Automated Ontology Generation in Software Engineering Standards

arXiv:2509.00140v13.4h-index: 1Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of scaling ontology use in software engineering by automating extraction from noisy, domain-specific text, though it is incremental in its method.

The paper tackled the problem of automated ontology generation from unstructured software engineering standards by proposing an LLM-assisted approach for relation triple extraction, achieving results comparable to and potentially superior to OpenIE methods.

Ontologies have supported knowledge representation and whitebox reasoning for decades; thus, the automated ontology generation (AOG) plays a crucial role in scaling their use. Software engineering standards (SES) consist of long, unstructured text (with high noise) and paragraphs with domain-specific terms. In this setting, relation triple extraction (RTE), together with term extraction, constitutes the first stage toward AOG. This work proposes an open-source large language model (LLM)-assisted approach to RTE for SES. Instead of solely relying on prompt-engineering-based methods, this study promotes the use of LLMs as an aid in constructing ontologies and explores an effective AOG workflow that includes document segmentation, candidate term mining, LLM-based relation inference, term normalization, and cross-section alignment. Golden-standard benchmarks at three granularities are constructed and used to evaluate the ontology generated from the study. The results show that it is comparable and potentially superior to the OpenIE method of triple extraction.

View on arXiv PDF

Similar