Baichuan Huang

RO
h-index13
13papers
422citations
Novelty46%
AI Score45

13 Papers

LGApr 8, 2024Code
LightFF: Lightweight Inference for Forward-Forward Algorithm

Amin Aminifar, Baichuan Huang, Azra Abtahi et al.

The human brain performs tasks with an outstanding energy efficiency, i.e., with approximately 20 Watts. The state-of-the-art Artificial/Deep Neural Networks (ANN/DNN), on the other hand, have recently been shown to consume massive amounts of energy. The training of these ANNs/DNNs is done almost exclusively based on the back-propagation algorithm, which is known to be biologically implausible. This has led to a new generation of forward-only techniques, including the Forward-Forward algorithm. In this paper, we propose a lightweight inference scheme specifically designed for DNNs trained using the Forward-Forward algorithm. We have evaluated our proposed lightweight inference scheme in the case of the MNIST and CIFAR datasets, as well as two real-world applications, namely, epileptic seizure detection and cardiac arrhythmia classification using wearable technologies, where complexity overheads/energy consumption is a major constraint, and demonstrate its relevance. Our code is available at https://github.com/AminAminifar/LightFF.

ROFeb 3, 2022Code
Interleaving Monte Carlo Tree Search and Self-Supervised Learning for Object Retrieval in Clutter

Baichuan Huang, Teng Guo, Abdeslam Boularias et al.

In this study, working with the task of object retrieval in clutter, we have developed a robot learning framework in which Monte Carlo Tree Search (MCTS) is first applied to enable a Deep Neural Network (DNN) to learn the intricate interactions between a robot arm and a complex scene containing many objects, allowing the DNN to partially clone the behavior of MCTS. In turn, the trained DNN is integrated into MCTS to help guide its search effort. We call this approach learning-guided Monte Carlo tree search for Object REtrieval (MORE), which delivers significant computational efficiency gains and added solution optimality. MORE is a self-supervised robotics framework/pipeline capable of working in the real world that successfully embodies the System 2 to System 1 learning philosophy proposed by Kahneman, where learned knowledge, used properly, can help greatly speed up a time-consuming decision process over time. Videos and supplementary material can be found at https://github.com/arc-l/more

ROOct 24, 2021Code
Fast High-Quality Tabletop Rearrangement in Bounded Workspace

Kai Gao, Darren Lau, Baichuan Huang et al.

In this paper, we examine the problem of rearranging many objects on a tabletop in a cluttered setting using overhand grasps. Efficient solutions for the problem, which capture a common task that we solve on a daily basis, are essential in enabling truly intelligent robotic manipulation. In a given instance, objects may need to be placed at temporary positions ("buffers") to complete the rearrangement, but allocating these buffer locations can be highly challenging in a cluttered environment. To tackle the challenge, a two-step baseline planner is first developed, which generates a primitive plan based on inherent combinatorial constraints induced by start and goal poses of the objects and then selects buffer locations assisted by the primitive plan. We then employ the "lazy" planner in a tree search framework which is further sped up by adapting a novel preprocessing routine. Simulation experiments show our methods can quickly generate high-quality solutions and are more robust in solving large-scale instances than existing state-of-the-art approaches. source:github.com/arc-l/TRLB

ROMay 6, 2021Code
Visual Foresight Trees for Object Retrieval from Clutter with Nonprehensile Rearrangement

Baichuan Huang, Shuai D. Han, Jingjin Yu et al.

This paper considers the problem of retrieving an object from many tightly packed objects using a combination of robotic pushing and grasping actions. Object retrieval in dense clutter is an important skill for robots to operate in households and everyday environments effectively. The proposed solution, Visual Foresight Trees (VFT), intelligently rearranges the clutter surrounding a target object so that it can be grasped easily. Rearrangement with nested nonprehensile actions is challenging as it requires predicting complex object interactions in a combinatorially large configuration space of multiple objects. We first show that a deep neural network can be trained to accurately predict the poses of the packed objects when the robot pushes one of them. The predictive network provides visual foresight and is used in a tree search as a state transition function in the space of scene images. The tree search returns a sequence of consecutive push actions yielding the best arrangement of the clutter for grasping the target object. Experiments in simulation and using a real robot and objects show that the proposed approach outperforms model-free techniques as well as model-based myopic methods both in terms of success rates and the number of executed actions, on several challenging tasks. A video introducing VFT, with robot experiments, is accessible at https://youtu.be/7cL-hmgvyec. The full source code is available at https://github.com/arc-l/vft.

RONov 9, 2020Code
DIPN: Deep Interaction Prediction Network with Application to Clutter Removal

Baichuan Huang, Shuai D. Han, Abdeslam Boularias et al.

We propose a Deep Interaction Prediction Network (DIPN) for learning to predict complex interactions that ensue as a robot end-effector pushes multiple objects, whose physical properties, including size, shape, mass, and friction coefficients may be unknown a priori. DIPN "imagines" the effect of a push action and generates an accurate synthetic image of the predicted outcome. DIPN is shown to be sample efficient when trained in simulation or with a real robotic system. The high accuracy of DIPN allows direct integration with a grasp network, yielding a robotic manipulation system capable of executing challenging clutter removal tasks while being trained in a fully self-supervised manner. The overall network demonstrates intelligent behavior in selecting proper actions between push and grasp for completing clutter removal tasks and significantly outperforms the previous state-of-the-art. Remarkably, DIPN achieves even better performance on the real robotic hardware system than in simulation. Videos, code, and experiments log are available at https://github.com/rutgers-arc-lab/dipn.

CVApr 30, 2020Code
M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network

Baichuan Huang, Hongwei Yi, Can Huang et al.

The present Multi-view stereo (MVS) methods with supervised learning-based networks have an impressive performance comparing with traditional MVS methods. However, the ground-truth depth maps for training are hard to be obtained and are within limited kinds of scenarios. In this paper, we propose a novel unsupervised multi-metric MVS network, named M^3VSNet, for dense point cloud reconstruction without any supervision. To improve the robustness and completeness of point cloud reconstruction, we propose a novel multi-metric loss function that combines pixel-wise and feature-wise loss function to learn the inherent constraints from different perspectives of matching correspondences. Besides, we also incorporate the normal-depth consistency in the 3D point cloud format to improve the accuracy and continuity of the estimated depth maps. Experimental results show that M3VSNet establishes the state-of-the-arts unsupervised method and achieves comparable performance with previous supervised MVSNet on the DTU dataset and demonstrates the powerful generalization ability on the Tanks and Temples benchmark with effective improvement. Our code is available at https://github.com/whubaichuan/M3VSNet

CVApr 21, 2020Code
M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network

Baichuan Huang, Hongwei Yi, Can Huang et al.

The present Multi-view stereo (MVS) methods with supervised learning-based networks have an impressive performance comparing with traditional MVS methods. However, the ground-truth depth maps for training are hard to be obtained and are within limited kinds of scenarios. In this paper, we propose a novel unsupervised multi-metric MVS network, named M^3VSNet, for dense point cloud reconstruction without any supervision. To improve the robustness and completeness of point cloud reconstruction, we propose a novel multi-metric loss function that combines pixel-wise and feature-wise loss function to learn the inherent constraints from different perspectives of matching correspondences. Besides, we also incorporate the normal-depth consistency in the 3D point cloud format to improve the accuracy and continuity of the estimated depth maps. Experimental results show that M3VSNet establishes the state-of-the-arts unsupervised method and achieves comparable performance with previous supervised MVSNet on the DTU dataset and demonstrates the powerful generalization ability on the Tanks and Temples benchmark with effective improvement. Our code is available at https://github.com/whubaichuan/M3VSNet.

ROAug 24, 2019Code
A Survey of Simultaneous Localization and Mapping with an Envision in 6G Wireless Networks

Baichuan Huang, Jun Zhao, Jingbin Liu

Simultaneous Localization and Mapping (SLAM) achieves the purpose of simultaneous positioning and map construction based on self-perception. The paper makes an overview in SLAM including Lidar SLAM, visual SLAM, and their fusion. For Lidar or visual SLAM, the survey illustrates the basic type and product of sensors, open source system in sort and history, deep learning embedded, the challenge and future. Additionally, visual inertial odometry is supplemented. For Lidar and visual fused SLAM, the paper highlights the multi-sensors calibration, the fusion in hardware, data, task layer. The open question and forward thinking with an envision in 6G wireless networks end the paper. The contributions of this paper can be summarized as follows: the paper provides a high quality and full-scale overview in SLAM. It's very friendly for new researchers to hold the development of SLAM and learn it very obviously. Also, the paper can be considered as a dictionary for experienced researchers to search and find new interesting orientation.

CLSep 19, 2025
BEFT: Bias-Efficient Fine-Tuning of Language Models

Baichuan Huang, Ananth Balashankar, Amir Aminifar

Fine-tuning all-bias-terms stands out among various parameter-efficient fine-tuning (PEFT) techniques, owing to its out-of-the-box usability and competitive performance, especially in low-data regimes. Bias-only fine-tuning has the potential for unprecedented parameter efficiency. However, the link between fine-tuning different bias terms (i.e., bias terms in the query, key, or value projections) and downstream performance remains unclear. The existing approaches, e.g., based on the magnitude of bias change or empirical Fisher information, provide limited guidance for selecting the particular bias term for effective fine-tuning. In this paper, we propose an approach for selecting the bias term to be fine-tuned, forming the foundation of our bias-efficient fine-tuning (BEFT). We extensively evaluate our bias-efficient approach against other bias-selection approaches, across a wide range of large language models (LLMs) spanning encoder-only and decoder-only architectures from 110M to 6.7B parameters. Our results demonstrate the effectiveness and superiority of our bias-efficient approach on diverse downstream tasks, including classification, multiple-choice, and generation tasks.

ROJun 20, 2025
Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping

Teng Guo, Baichuan Huang, Jingjin Yu

Accurate 6D object pose estimation is a prerequisite for successfully completing robotic prehensile and non-prehensile manipulation tasks. At present, 6D pose estimation for robotic manipulation generally relies on depth sensors based on, e.g., structured light, time-of-flight, and stereo-vision, which can be expensive, produce noisy output (as compared with RGB cameras), and fail to handle transparent objects. On the other hand, state-of-the-art monocular depth estimation models (MDEMs) provide only affine-invariant depths up to an unknown scale and shift. Metric MDEMs achieve some successful zero-shot results on public datasets, but fail to generalize. We propose a novel framework, Monocular One-shot Metric-depth Alignment (MOMA), to recover metric depth from a single RGB image, through a one-shot adaptation building on MDEM techniques. MOMA performs scale-rotation-shift alignments during camera calibration, guided by sparse ground-truth depth points, enabling accurate depth estimation without additional data collection or model retraining on the testing setup. MOMA supports fine-tuning the MDEM on transparent objects, demonstrating strong generalization capabilities. Real-world experiments on tabletop 2-finger grasping and suction-based bin-picking applications show MOMA achieves high success rates in diverse tasks, confirming its effectiveness.

ROFeb 3, 2022
Stackelberg Strategic Guidance for Heterogeneous Robots Collaboration

Yuhan Zhao, Baichuan Huang, Jingjin Yu et al.

In this study, we explore the application of game theory, in particular Stackelberg games, to address the issue of effective coordination strategy generation for heterogeneous robots with one-way communication. To that end, focusing on the task of multi-object rearrangement, we develop a theoretical and algorithmic framework that provides strategic guidance for a pair of robot arms, a leader and a follower where the leader has a model of the follower's decision-making process, through the computation of a feedback Stackelberg equilibrium. With built-in tolerance of model uncertainty, the strategic guidance generated by our planning algorithm not only improves the overall efficiency in solving the rearrangement tasks, but is also robust to common pitfalls in collaboration, e.g., chattering.

ROOct 8, 2019
Advanced Autonomy on a Low-Cost Educational Drone Platform

Luke Eller, Theo Guerin, Baichuan Huang et al.

PiDrone is a quadrotor platform created to accompany an introductory robotics course. Students build an autonomous flying robot from scratch and learn to program it through assignments and projects. Existing educational robots do not have significant autonomous capabilities, such as high-level planning and mapping. We present a hardware and software framework for an autonomous aerial robot, in which all software for autonomy can run onboard the drone, implemented in Python. We present an Unscented Kalman Filter (UKF) for accurate state estimation. Next, we present an implementation of Monte Carlo (MC) Localization and FastSLAM for Simultaneous Localization and Mapping (SLAM). The performance of UKF, localization, and SLAM is tested and compared to ground truth, provided by a motion-capture system. Our evaluation demonstrates that our autonomous educational framework runs quickly and accurately on a Raspberry Pi in Python, making it ideal for use in educational settings.

ROMay 28, 2019
Planning with State Abstractions for Non-Markovian Task Specifications

Yoonseon Oh, Roma Patel, Thao Nguyen et al.

Often times, we specify tasks for a robot using temporal language that can also span different levels of abstraction. The example command ``go to the kitchen before going to the second floor'' contains spatial abstraction, given that ``floor'' consists of individual rooms that can also be referred to in isolation ("kitchen", for example). There is also a temporal ordering of events, defined by the word "before". Previous works have used Linear Temporal Logic (LTL) to interpret temporal language (such as "before"), and Abstract Markov Decision Processes (AMDPs) to interpret hierarchical abstractions (such as "kitchen" and "second floor"), separately. To handle both types of commands at once, we introduce the Abstract Product Markov Decision Process (AP-MDP), a novel approach capable of representing non-Markovian reward functions at different levels of abstractions. The AP-MDP framework translates LTL into its corresponding automata, creates a product Markov Decision Process (MDP) of the LTL specification and the environment MDP, and decomposes the problem into subproblems to enable efficient planning with abstractions. AP-MDP performs faster than a non-hierarchical method of solving LTL problems in over 95% of tasks, and this number only increases as the size of the environment domain increases. We also present a neural sequence-to-sequence model trained to translate language commands into LTL expression, and a new corpus of non-Markovian language commands spanning different levels of abstraction. We test our framework with the collected language commands on a drone, demonstrating that our approach enables a robot to efficiently solve temporal commands at different levels of abstraction.