MRPoS: Mixed Reality-Based Robot Navigation Interface Using Spatial Pointing and Speech with Large Language Model
This work addresses the accessibility and efficiency issues in MR robot navigation for users, particularly beginners, though it is incremental as it builds on existing MR and LLM technologies.
The paper tackles the problem of repetitive and physically demanding manual gestures in Mixed Reality (MR) robot navigation interfaces by proposing MRPoS, a novel framework that uses spatial pointing and LLM-based speech interaction, which significantly reduces task completion time and workload compared to conventional systems.
Recent advancements have made robot navigation more intuitive by transitioning from traditional 2D displays to spatially aware Mixed Reality (MR) systems. However, current MR interfaces often rely on manual "air tap" gestures for goal placement, which can be repetitive and physically demanding, especially for beginners. This paper proposes the Mixed Reality-Based Robot Navigation Interface using Spatial Pointing and Speech (MRPoS). This novel framework replaces complex hand gestures with a natural, multimodal interface combining spatial pointing with Large Language Model (LLM)-based speech interaction. By leveraging both information, the system translates verbal intent into navigation goals visualized by MR technology. Comprehensive experiments comparing MRPoS against conventional gesture-based systems demonstrate that our approach significantly reduces task completion time and workload, providing a more accessible and efficient interface. For additional material, please check: https://mertcookimg.github.io/mrpos