ROSep 22, 2024
SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for Resource-Constrained Embedded PlatformsNiraj Pudasaini, Muhammad Abdullah Hanif, Muhammad Shafique
Optimizing Deep Learning-based Simultaneous Localization and Mapping (DL-SLAM) algorithms is essential for efficient implementation on resource-constrained embedded platforms, enabling real-time on-board computation in autonomous mobile robots. This paper presents SPAQ-DL-SLAM, a framework that strategically applies Structured Pruning and Quantization (SPAQ) to the architecture of one of the state-ofthe-art DL-SLAM algorithms, DROID-SLAM, for resource and energy-efficiency. Specifically, we perform structured pruning with fine-tuning based on layer-wise sensitivity analysis followed by 8-bit post-training static quantization (PTQ) on the deep learning modules within DROID-SLAM. Our SPAQ-DROIDSLAM model, optimized version of DROID-SLAM model using our SPAQ-DL-SLAM framework with 20% structured pruning and 8-bit PTQ, achieves an 18.9% reduction in FLOPs and a 79.8% reduction in overall model size compared to the DROID-SLAM model. Our evaluations on the TUM-RGBD benchmark shows that SPAQ-DROID-SLAM model surpasses the DROID-SLAM model by an average of 10.5% on absolute trajectory error (ATE) metric. Additionally, our results on the ETH3D SLAM training benchmark demonstrate enhanced generalization capabilities of the SPAQ-DROID-SLAM model, seen by a higher Area Under the Curve (AUC) score and success in 2 additional data sequences compared to the DROIDSLAM model. Despite these improvements, the model exhibits performance variance on the distinct Vicon Room sequences from the EuRoC dataset, which are captured at high angular velocities. This varying performance at some distinct scenarios suggests that designing DL-SLAM algorithms taking operating environments and tasks in consideration can achieve optimal performance and resource efficiency for deployment in resource-constrained embedded platforms.
34.2ROApr 28
Egocentric Tactile and Proximity Sensors as Observation Priors for Humanoid Collision AvoidanceCarson Kohlbrenner, Niraj Pudasaini, William Xie et al.
Collision-free motion is often aided by tactile and proximity sensors distributed on the body of the robot due to their resistance to occlusion as opposed to external cameras. However, how to shape the sensor's properties, such as sensing coverage; type; and range, to enable avoidant behavior remains unclear. In this work, we present a reinforcement learning framework for whole-body collision avoidance on a humanoid H1-2 robot and use it to characterize how sensor properties shape learned avoidance behavior. Using dodgeball as a benchmark task, we ablate the properties of sensors distributed across the upper body of the robot and find that raw proximity measurements can substitute for explicit object localization provided the sensing range is sufficient and that sparse non-directional proximity signals outpace dense directional alternatives in sample efficiency.
ROApr 13, 2025
Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-ManipulationCongcong Wen, Geeta Chandra Raju Bethala, Yu Hao et al.
Humanoid loco-manipulation, which integrates whole-body locomotion with dexterous manipulation, remains a fundamental challenge in robotics. Beyond whole-body coordination and balance, a central difficulty lies in understanding human instructions and translating them into coherent sequences of embodied actions. Recent advances in foundation models provide transferable multimodal representations and reasoning capabilities, yet existing efforts remain largely restricted to either locomotion or manipulation in isolation, with limited applicability to humanoid settings. In this paper, we propose Humanoid-COA, the first humanoid agent framework that integrates foundation model reasoning with an Embodied Chain-of-Action (CoA) mechanism for zero-shot loco-manipulation. Within the perception--reasoning--action paradigm, our key contribution lies in the reasoning stage, where the proposed CoA mechanism decomposes high-level human instructions into structured sequences of locomotion and manipulation primitives through affordance analysis, spatial inference, and whole-body action reasoning. Extensive experiments on two humanoid robots, Unitree H1-2 and G1, in both an open test area and an apartment environment, demonstrate that our framework substantially outperforms prior baselines across manipulation, locomotion, and loco-manipulation tasks, achieving robust generalization to long-horizon and unstructured scenarios. Project page: https://humanoid-coa.github.io/