ROCVNov 18, 2025

RoboTidy : A 3D Gaussian Splatting Household Tidying Benchmark for Embodied Navigation and Action

arXiv:2511.14161v21 citations
Originality Synthesis-oriented
AI Analysis

This provides a scalable platform for evaluating embodied AI systems in household tidying, addressing a key gap in the field.

The authors tackled the lack of comprehensive benchmarks for household tidying in robotics by proposing RoboTidy, a unified benchmark that includes 500 photorealistic 3D scenes, 6.4k manipulation and 1.5k navigation trajectories, and supports real-world deployment for language-guided robots.

Household tidying is an important application area, yet current benchmarks neither model user preferences nor support mobility, and they generalize poorly, making it hard to comprehensively assess integrated language-to-action capabilities. To address this, we propose RoboTidy, a unified benchmark for language-guided household tidying that supports Vision-Language-Action (VLA) and Vision-Language-Navigation (VLN) training and evaluation. RoboTidy provides 500 photorealistic 3D Gaussian Splatting (3DGS) household scenes (covering 500 objects and containers) with collisions, formulates tidying as an "Action (Object, Container)" list, and supplies 6.4k high-quality manipulation demonstration trajectories and 1.5k naviagtion trajectories to support both few-shot and large-scale training. We also deploy RoboTidy in the real world for object tidying, establishing an end-to-end benchmark for household tidying. RoboTidy offers a scalable platform and bridges a key gap in embodied AI by enabling holistic and realistic evaluation of language-guided robots.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes