TextMine: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action
This work addresses the challenge of information transferability for humanitarian mine action agencies, though it is incremental as it applies existing LLM methods to a new domain-specific dataset.
The authors tackled the problem of extracting structured knowledge from unstructured humanitarian mine action reports by introducing TextMine, a dataset, evaluation framework, and ontology-guided LLM pipeline, which improved extraction accuracy by up to 44.2%, reduced hallucinations by 22.5%, and enhanced format adherence by 20.9% compared to baselines.
Humanitarian Mine Action (HMA) addresses the challenge of detecting and removing landmines from conflict regions. Much of the life-saving operational knowledge produced by HMA agencies is buried in unstructured reports, limiting the transferability of information between agencies. To address this issue, we propose TextMine: the first dataset, evaluation framework and ontology-guided large language model (LLM) pipeline for knowledge extraction in the HMA domain. TextMine structures HMA reports into (subject, relation, object)-triples, thus creating domain-specific knowledge. To ensure real-world relevance, we created the dataset in collaboration with Cambodian Mine Action Center (CMAC). We further introduce a bias-aware evaluation framework that combines human-annotated triples with an LLM-as-Judge protocol to mitigate position bias in reference-free scoring. Our experiments show that ontology-aligned prompts improve extraction accuracy by up to 44.2%, reduce hallucinations by 22.5%, and enhance format adherence by 20.9% compared to baseline models. We publicly release the dataset and code.