SituationalLLM: Proactive language models with scene awareness for dynamic, contextual task guidance
This addresses the problem of environment-aware AI assistants for users needing real-world task guidance, but it is incremental as it builds on existing LLMs with scene integration.
The paper tackles the problem of LLMs struggling to provide actionable guidance in physical environments due to lack of scene awareness, and presents SituationalLLM, which integrates structured scene information to deliver proactive, context-aware assistance, outperforming generic LLM baselines in task specificity, reliability, and adaptability.
Large language models (LLMs) have achieved remarkable success in text-based tasks but often struggle to provide actionable guidance in real-world physical environments. This is because of their inability to recognize their limited understanding of the user's physical context. We present SituationalLLM, a novel approach that integrates structured scene information into an LLM to deliver proactive, context-aware assistance. By encoding objects, attributes, and relationships in a custom Scene Graph Language, SituationalLLM actively identifies gaps in environmental context and seeks clarifications during user interactions. This behavior emerges from training on the Situational Awareness Database for Instruct-Tuning (SAD-Instruct), which combines diverse, scenario-specific scene graphs with iterative, dialogue-based refinements. Experimental results indicate that SituationalLLM outperforms generic LLM baselines in task specificity, reliability, and adaptability, paving the way for environment-aware AI assistants capable of delivering robust, user-centric guidance under real-world constraints.