AIAug 12, 2025

Prompt-and-Check: Using Large Language Models to Evaluate Communication Protocol Compliance in Simulation-Based Training

arXiv:2508.08652v1h-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for automated assessment in safety-critical training domains, offering a lightweight, deployable solution for augmenting debriefing and performance feedback, though it is incremental as it applies existing LLM methods to a new application area.

The paper tackled the problem of evaluating communication protocol compliance in simulation-based training by using prompt-based inference with open-source large language models on consumer-grade GPUs, achieving effective context-aware reasoning without task-specific training as demonstrated through classification accuracy and agreement scores in a maritime case study.

Accurate evaluation of procedural communication compliance is essential in simulation-based training, particularly in safety-critical domains where adherence to compliance checklists reflects operational competence. This paper explores a lightweight, deployable approach using prompt-based inference with open-source large language models (LLMs) that can run efficiently on consumer-grade GPUs. We present Prompt-and-Check, a method that uses context-rich prompts to evaluate whether each checklist item in a protocol has been fulfilled, solely based on transcribed verbal exchanges. We perform a case study in the maritime domain with participants performing an identical simulation task, and experiment with models such as LLama 2 7B, LLaMA 3 8B and Mistral 7B, running locally on an RTX 4070 GPU. For each checklist item, a prompt incorporating relevant transcript excerpts is fed into the model, which outputs a compliance judgment. We assess model outputs against expert-annotated ground truth using classification accuracy and agreement scores. Our findings demonstrate that prompting enables effective context-aware reasoning without task-specific training. This study highlights the practical utility of LLMs in augmenting debriefing, performance feedback, and automated assessment in training environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes