AIDBJul 17, 2019

CLaRO: a Data-driven CNL for Specifying Competency Questions

arXiv:1907.07378v11 citations
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in ontology engineering by providing a data-driven tool to improve the authoring and validation of competency questions, which is incremental but fills a gap in existing limited templates.

The authors tackled the lack of a controlled natural language (CNL) for authoring Competency Questions (CQs) in ontology development by proposing CLaRO, a template-based CNL derived from a dataset of 234 CQs, achieving about 90% coverage on unseen questions and identifying that about one-third of test questions were invalid.

Competency Questions (CQs) for an ontology and similar artefacts aim to provide insights into the contents of an ontology and to demarcate its scope. The absence of a controlled natural language, tooling and automation to support the authoring of CQs has hampered their effective use in ontology development and evaluation. The few question templates that exists are based on informal analyses of a small number of CQs and have limited coverage of question types and sentence constructions. We aim to fill this gap by proposing a template-based CNL to author CQs, called CLaRO. For its design, we exploited a new dataset of 234 CQs that had been processed automatically into 106 patterns, which we analysed and used to design a template-based CNL, with an additional CNL model and XML serialisation. The CNL was evaluated with a subset of questions from the original dataset and with two sets of newly sourced CQs. The coverage of CLaRO, with its 93 main templates and 41 linguistic variants, is about 90% for unseen questions. CLaRO has the potential to facilitate streamlining formalising ontology content requirements and, given that about one third of the competency questions in the test sets turned out to be invalid questions, assist in writing good questions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes