Hung-Jui Huang

CV
h-index4
5papers
99citations
Novelty67%
AI Score47

5 Papers

71.5CVApr 10
TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches

Langzhe Gu, Hung-Jui Huang, Mohamad Qadri et al.

Accurate object geometry estimation is essential for many downstream tasks, including robotic manipulation and physical interaction. Although vision is the dominant modality for shape perception, it becomes unreliable under occlusions or challenging lighting conditions. In such scenarios, tactile sensing provides direct geometric information through physical contact. However, reconstructing global 3D geometry from sparse local touches alone is fundamentally underconstrained. We present TouchAnything, a framework that leverages a pretrained large-scale 2D vision diffusion model as a semantic and geometric prior for 3D reconstruction from sparse tactile measurements. Unlike prior work that trains category-specific reconstruction networks or learns diffusion models directly from tactile data, we transfer the geometric knowledge encoded in pretrained visual diffusion models to the tactile domain. Given sparse contact constraints and a coarse class-level description of the object, we formulate reconstruction as an optimization problem that enforces tactile consistency while guiding solutions toward shapes consistent with the diffusion prior. Our method reconstructs accurate geometries from only a few touches, outperforms existing baselines, and enables open-world 3D reconstruction of previously unseen object instances. Our project page is https://grange007.github.io/touchanything .

CVJan 9, 2025
GelBelt: A Vision-based Tactile Sensor for Continuous Sensing of Large Surfaces

Mohammad Amin Mirzaee, Hung-Jui Huang, Wenzhen Yuan

Scanning large-scale surfaces is widely demanded in surface reconstruction applications and detecting defects in industries' quality control and maintenance stages. Traditional vision-based tactile sensors have shown promising performance in high-resolution shape reconstruction while suffering limitations such as small sensing areas or susceptibility to damage when slid across surfaces, making them unsuitable for continuous sensing on large surfaces. To address these shortcomings, we introduce a novel vision-based tactile sensor designed for continuous surface sensing applications. Our design uses an elastomeric belt and two wheels to continuously scan the target surface. The proposed sensor showed promising results in both shape reconstruction and surface fusion, indicating its applicability. The dot product of the estimated and reference surface normal map is reported over the sensing area and for different scanning speeds. Results indicate that the proposed sensor can rapidly scan large-scale surfaces with high accuracy at speeds up to 45 mm/s.

ROAug 21, 2025
GelSLAM: A Real-time, High-Fidelity, and Robust 3D Tactile SLAM System

Hung-Jui Huang, Mohammad Amin Mirzaee, Michael Kaess et al.

Accurately perceiving an object's pose and shape is essential for precise grasping and manipulation. Compared to common vision-based methods, tactile sensing offers advantages in precision and immunity to occlusion when tracking and reconstructing objects in contact. This makes it particularly valuable for in-hand and other high-precision manipulation tasks. In this work, we present GelSLAM, a real-time 3D SLAM system that relies solely on tactile sensing to estimate object pose over long periods and reconstruct object shapes with high fidelity. Unlike traditional point cloud-based approaches, GelSLAM uses tactile-derived surface normals and curvatures for robust tracking and loop closure. It can track object motion in real time with low error and minimal drift, and reconstruct shapes with submillimeter accuracy, even for low-texture objects such as wooden tools. GelSLAM extends tactile sensing beyond local contact to enable global, long-horizon spatial perception, and we believe it will serve as a foundation for many precise manipulation tasks involving interaction with objects in hand. The video demo is available on our website: https://joehjhuang.github.io/gelslam.

ROJun 16, 2021
Planning on a (Risk) Budget: Safe Non-Conservative Planning in Probabilistic Dynamic Environments

Hung-Jui Huang, Kai-Chi Huang, Michal Čáp et al.

Planning in environments with other agents whose future actions are uncertain often requires compromise between safety and performance. Here our goal is to design efficient planning algorithms with guaranteed bounds on the probability of safety violation, which nonetheless achieve non-conservative performance. To quantify a system's risk, we define a natural criterion called interval risk bounds (IRBs), which provide a parametric upper bound on the probability of safety violation over a given time interval or task. We present a novel receding horizon algorithm, and prove that it can satisfy a desired IRB. Our algorithm maintains a dynamic risk budget which constrains the allowable risk at each iteration, and guarantees recursive feasibility by requiring a safe set to be reachable by a contingency plan within the budget. We empirically demonstrate that our algorithm is both safer and less conservative than strong baselines in two simulated autonomous driving experiments in scenarios involving collision avoidance with other vehicles, and additionally demonstrate our algorithm running on an autonomous class 8 truck.

CVMar 12, 2018
Omnidirectional CNN for Visual Place Recognition and Navigation

Tsun-Hsuan Wang, Hung-Jui Huang, Juan-Ting Lin et al.

$ $Visual place recognition is challenging, especially when only a few place exemplars are given. To mitigate the challenge, we consider place recognition method using omnidirectional cameras and propose a novel Omnidirectional Convolutional Neural Network (O-CNN) to handle severe camera pose variation. Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place. With the ability to estimate relative distance, a heuristic policy is proposed to navigate a robot to the retrieved closest place. Note that the network is designed to take advantage of the omnidirectional view by incorporating circular padding and rotation invariance. To train a powerful O-CNN, we build a virtual world for training on a large scale. We also propose a continuous lifted structured feature embedding loss to learn the concept of distance efficiently. Finally, our experimental results confirm that our method achieves state-of-the-art accuracy and speed with both the virtual world and real-world datasets.