ROCVMar 7, 2025

Learning High-Fidelity Robot Self-Model with Articulated 3D Gaussian Splatting

arXiv:2503.05398v22 citationsh-index: 3int j robot res
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating detailed, texture-aware robot models with minimal human intervention, which is incremental by improving modeling quality and reducing data costs compared to existing methods.

The paper tackles the problem of robot self-modeling by developing a method that learns high-fidelity models of robot morphology, kinematics, and texture using 3D Gaussian splatting and neural networks, achieving link-level accuracy and enabling applications like motion planning and inverse kinematics.

Self-modeling enables robots to build task-agnostic models of their morphology and kinematics based on data that can be automatically collected, with minimal human intervention and prior information, thereby enhancing machine intelligence. Recent research has highlighted the potential of data-driven technology in modeling the morphology and kinematics of robots. However, existing self-modeling methods suffer from either low modeling quality or excessive data acquisition costs. Beyond morphology and kinematics, texture is also a crucial component of robots, which is challenging to model and remains unexplored. In this work, a high-quality, texture-aware, and link-level method is proposed for robot self-modeling. We utilize three-dimensional (3D) Gaussians to represent the static morphology and texture of robots, and cluster the 3D Gaussians to construct neural ellipsoid bones, whose deformations are controlled by the transformation matrices generated by a kinematic neural network. The 3D Gaussians and kinematic neural network are trained using data pairs composed of joint angles, camera parameters and multi-view images without depth information. By feeding the kinematic neural network with joint angles, we can utilize the well-trained model to describe the corresponding morphology, kinematics and texture of robots at the link level, and render robot images from different perspectives with the aid of 3D Gaussian splatting. Furthermore, we demonstrate that the established model can be exploited to perform downstream tasks such as motion planning and inverse kinematics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes