CLJun 27, 2025

Assessing the feasibility of Large Language Models for detecting micro-behaviors in team interactions during space missions

arXiv:2506.22679v11 citationsh-index: 19INTERSPEECH
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of analyzing team communication dynamics for training interventions in high-stakes environments like space missions, but it is incremental as it applies existing LLM methods to a new domain-specific dataset.

The study tackled the problem of detecting subtle micro-behaviors in team conversations during simulated space missions using large language models, finding that instruction fine-tuned Llama-3.1 achieved macro F1-scores of 44% for 3-way classification and 68% for binary classification, outperforming encoder-only models.

We explore the feasibility of large language models (LLMs) in detecting subtle expressions of micro-behaviors in team conversations using transcripts collected during simulated space missions. Specifically, we examine zero-shot classification, fine-tuning, and paraphrase-augmented fine-tuning with encoder-only sequence classification LLMs, as well as few-shot text generation with decoder-only causal language modeling LLMs, to predict the micro-behavior associated with each conversational turn (i.e., dialogue). Our findings indicate that encoder-only LLMs, such as RoBERTa and DistilBERT, struggled to detect underrepresented micro-behaviors, particularly discouraging speech, even with weighted fine-tuning. In contrast, the instruction fine-tuned version of Llama-3.1, a decoder-only LLM, demonstrated superior performance, with the best models achieving macro F1-scores of 44% for 3-way classification and 68% for binary classification. These results have implications for the development of speech technologies aimed at analyzing team communication dynamics and enhancing training interventions in high-stakes environments such as space missions, particularly in scenarios where text is the only accessible data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes