Aliyah Smith

h-index13
2papers

2 Papers

ROMay 7, 2024
Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

Ola Shorinwa, Johnathan Tucker, Aliyah Smith et al.

We present Splat-MOVER, a modular robotics stack for open-vocabulary robotic manipulation, which leverages the editability of Gaussian Splatting (GSplat) scene representations to enable multi-stage manipulation tasks. Splat-MOVER consists of: (i) ASK-Splat, a GSplat representation that distills semantic and grasp affordance features into the 3D scene. ASK-Splat enables geometric, semantic, and affordance understanding of 3D scenes, which is critical in many robotics tasks; (ii) SEE-Splat, a real-time scene-editing module using 3D semantic masking and infilling to visualize the motions of objects that result from robot interactions in the real-world. SEE-Splat creates a "digital twin" of the evolving environment throughout the manipulation task; and (iii) Grasp-Splat, a grasp generation module that uses ASK-Splat and SEE-Splat to propose affordance-aligned candidate grasps for open-world objects. ASK-Splat is trained in real-time from RGB images in a brief scanning phase prior to operation, while SEE-Splat and Grasp-Splat run in real-time during operation. We demonstrate the superior performance of Splat-MOVER in hardware experiments on a Kinova robot compared to two recent baselines in four single-stage, open-vocabulary manipulation tasks and in four multi-stage manipulation tasks, using the edited scene to reflect changes due to prior manipulation stages, which is not possible with existing baselines. Video demonstrations and the code for the project are available at https://splatmover.github.io.

4.9ROMar 25
The Role of Consequential and Functional Sound in Human-Robot Interaction: Toward Audio Augmented Reality Interfaces

Aliyah Smith, Monroe Kennedy

Robot sound, encompassing both consequential operational noise and intentionally designed auditory cues, plays an important role in human-robot interaction (HRI). Developing a deeper understanding of how robot sounds influence human experience, and how technologies such as augmented reality (AR) modulate these effects, can enable the design of more socially acceptable robots and more effective, intuitive human-robot interfaces. In this work, we present a three-part mixed-methods study (N = 51) that investigates (i) the effects of consequential robot sounds on human perception under varying degrees of physical colocation, (ii) human accuracy in localizing spatial audio cues delivered via augmented reality, and (iii) the use of augmented spatial audio cues for functional and transformative communication during collaborative handover tasks, in comparison to non-AR sound designs. Contrary to prior findings, our results indicate that the consequential sounds of a Kinova Gen3 manipulator did not negatively affect participants' perceptions of the robot. Participants demonstrated high accuracy in localizing lateral spatial cues, whereas frontal cues proved more challenging, delineating conditions under which spatial auditory feedback is most effective. Qualitative findings further reveal that augmented spatial audio cues can simultaneously convey task-relevant information while fostering a sense of warmth and reducing user discomfort during interaction. Together, these findings elucidate the perceptual effects of consequential robot sound and position sound, particularly augmented spatial audio, as a meaningful yet underutilized design resource for human-robot interaction.