ROCLITLGSYJul 26, 2025

Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

arXiv:2507.19947v21 citationsh-index: 19SMC
Originality Incremental advance
AI Analysis

This work addresses the challenge of integrating uncertain human inputs for robots in collaborative tasks, representing an incremental advancement in uncertainty-aware fusion methods.

The paper tackled the problem of uncertainty-aware fusion of human spatial language and robot sensor data by introducing a Feature Pyramid Likelihood Grounding Network (FP-LGN) that learns to ground spatial language with aleatoric uncertainty, resulting in improved human-robot collaborative task performance with significant gains.

Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human-robot collaborative task performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes