CLAIJul 24, 2025

Trusted Knowledge Extraction for Operations and Maintenance Intelligence

arXiv:2507.22935v31 citationsh-index: 13Has CodeNat Lang Process J
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of trusted knowledge extraction for mission-critical industries like aviation, but it is incremental as it focuses on benchmarking existing tools rather than introducing new methods.

The paper tackled the challenge of extracting operational intelligence from confidential organizational data by evaluating 16 NLP tools and LLMs for knowledge graph construction in the aircraft maintenance domain, finding significant performance limitations in zero-shot settings.

Deriving operational intelligence from organizational data repositories is a key challenge due to the dichotomy of data confidentiality vs data integration objectives, as well as the limitations of Natural Language Processing (NLP) tools relative to the specific knowledge structure of domains such as operations and maintenance. In this work, we discuss Knowledge Graph construction and break down the Knowledge Extraction process into its Named Entity Recognition, Coreference Resolution, Named Entity Linking, and Relation Extraction functional components. We then evaluate sixteen NLP tools in concert with or in comparison to the rapidly advancing capabilities of Large Language Models (LLMs). We focus on the operational and maintenance intelligence use case for trusted applications in the aircraft industry. A baseline dataset is derived from a rich public domain US Federal Aviation Administration dataset focused on equipment failures or maintenance requirements. We assess the zero-shot performance of NLP and LLM tools that can be operated within a controlled, confidential environment (no data is sent to third parties). Based on our observation of significant performance limitations, we discuss the challenges related to trusted NLP and LLM tools as well as their Technical Readiness Level for wider use in mission-critical industries such as aviation. We conclude with recommendations to enhance trust and provide our open-source curated dataset to support further baseline testing and evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes