54.1HCMar 15
What Are You Really Asking For? A Comparative 5W1H Analysis of Learner Questioning in CPR Training with IVAs in Screen-based and Augmented Reality EnvironmentsHyerim Park, Jinseok Hong, Heejeong Ko et al.
Question-asking is one of the key indicators of cognitive engagement. However, understanding how the distinct psychological affordances of presentation media shape learners' spoken inquiries with embodied Intelligent Virtual Agents (IVAs) remains limited. To systematically examine this process, we propose a 5W1H-based framework for analyzing learner questions. Using this framework, we conducted a user study comparing an Augmented Reality-based IVA (AR-IVA) deployed in the physical environment with a screen-based IVA (Video-IVA) during cardiopulmonary resuscitation (CPR) instruction. Results showed that the AR-IVA elicited higher spatial and social presence and promoted more frequent and longer questions focused on clarification and understanding. In contrast, the Video-IVA encouraged questions regarding procedural refinement. Presence acted as a selective filter, shaping the timing and topic of questions rather than as a universal mediator. These effects were significantly moderated by learners' motivational and strategic characteristics toward learning. Based on these findings, we propose design implications for IVA-supported learning systems.
AIFeb 26
Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language ModelsIk-hwan Kim, Hyeongrok Han, Mingi Jung et al.
Large Reasoning Models (LRMs) often exhibit structural fragility in complex reasoning tasks, failing to produce correct answers even after successfully deriving valid intermediate steps. Through systematic analysis, we observe that these failures frequently stem not from a lack of reasoning capacity, but from a deficiency in self-regulatory control, where valid logic is destabilized by uncontrolled exploration or the failure to recognize logical sufficiency. Motivated by this observation, we propose Metacognitive Behavioral Tuning (MBT), a post-training framework that explicitly injects metacognitive behaviors into the model's thought process. MBT implements this via two complementary formulations: (1) MBT-S, which synthesizes rigorous reasoning traces from scratch, and (2) MBT-R, which rewrites the student's initial traces to stabilize intrinsic exploration patterns. Experiments across multi-hop QA benchmarks demonstrate that MBT consistently outperforms baselines, achieving notable gains on challenging benchmarks. By effectively eliminating reasoning collapse, MBT achieves higher accuracy with significantly reduced token consumption, demonstrating that internalizing metacognitive strategies leads to more stable and robust reasoning.
CLMay 26, 2025Code
LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical StudyDongil Yang, Minjin Kim, Sunghwan Kim et al.
The remarkable reasoning and generalization capabilities of Large Language Models (LLMs) have paved the way for their expanding applications in embodied AI, robotics, and other real-world tasks. To effectively support these applications, grounding in spatial and temporal understanding in multimodal environments is essential. To this end, recent works have leveraged scene graphs, a structured representation that encodes entities, attributes, and their relationships in a scene. However, a comprehensive evaluation of LLMs' ability to utilize scene graphs remains limited. In this work, we introduce Text-Scene Graph (TSG) Bench, a benchmark designed to systematically assess LLMs' ability to (1) understand scene graphs and (2) generate them from textual narratives. With TSG Bench we evaluate 11 LLMs and reveal that, while models perform well on scene graph understanding, they struggle with scene graph generation, particularly for complex narratives. Our analysis indicates that these models fail to effectively decompose discrete scenes from a complex narrative, leading to a bottleneck when generating scene graphs. These findings underscore the need for improved methodologies in scene graph generation and provide valuable insights for future research. The demonstration of our benchmark is available at https://tsg-bench.netlify.app. Additionally, our code and evaluation data are publicly available at https://github.com/docworlds/tsg-bench.