NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks
This work addresses the challenge of isolated progress in GUI and embodied navigation for AI systems, offering a unified approach that could streamline development in these areas, though it appears incremental in combining existing paradigms.
The paper tackled the problem of unifying GUI and embodied navigation tasks, which have been developed separately, by formulating them as Markov Decision Processes and introducing NaviMaster, a unified agent that outperforms state-of-the-art methods in benchmarks across both domains.
Recent advances in Graphical User Interface (GUI) and embodied navigation have driven progress, yet these domains have largely evolved in isolation, with disparate datasets and training paradigms. In this paper, we observe that both tasks can be formulated as Markov Decision Processes (MDP), suggesting a foundational principle for their unification. Hence, we present NaviMaster, the first unified agent capable of unifying GUI navigation and embodied navigation within a single framework. Specifically, NaviMaster (i) proposes a visual-target trajectory collection pipeline that generates trajectories for both GUI and embodied tasks using a single formulation. (ii) employs a unified reinforcement learning framework on the mix data to improve generalization. (iii) designs a novel distance-aware reward to ensure efficient learning from the trajectories. Through extensive experiments on out-of-domain benchmarks, NaviMaster is shown to outperform state-of-the-art agents in GUI navigation, spatial affordance prediction, and embodied navigation. Ablation studies further demonstrate the efficacy of our unified training strategy, data mixing strategy, and reward design.