Zherong Pan

RO
h-index20
34papers
557citations
Novelty55%
AI Score54

34 Papers

LGDec 5, 2022
Learning Physically Realizable Skills for Online Packing of General 3D Shapes

Hang Zhao, Zherong Pan, Yang Yu et al.

We study the problem of learning online packing skills for irregular 3D shapes, which is arguably the most challenging setting of bin packing problems. The goal is to consecutively move a sequence of 3D objects with arbitrary shapes into a designated container with only partial observations of the object sequence. Meanwhile, we take physical realizability into account, involving physics dynamics and constraints of a placement. The packing policy should understand the 3D geometry of the object to be packed and make effective decisions to accommodate it in the container in a physically realizable way. We propose a Reinforcement Learning (RL) pipeline to learn the policy. The complex irregular geometry and imperfect object placement together lead to huge solution space. Direct training in such space is prohibitively data intensive. We instead propose a theoretically-provable method for candidate action generation to reduce the action space of RL and the learning burden. A parameterized policy is then learned to select the best placement from the candidates. Equipped with an efficient method of asynchronous RL acceleration and a data preparation process of simulation-ready training sequences, a mature packing policy can be trained in a physics-based environment within 48 hours. Through extensive evaluation on a variety of real-life shape datasets and comparisons with state-of-the-art baselines, we demonstrate that our method outperforms the best-performing baseline on all datasets by at least 12.8% in terms of packing utility.

GRSep 19, 2023
Learning based 2D Irregular Shape Packing

Zeshi Yang, Zherong Pan, Manyi Li et al.

2D irregular shape packing is a necessary step to arrange UV patches of a 3D model within a texture atlas for memory-efficient appearance rendering in computer graphics. Being a joint, combinatorial decision-making problem involving all patch positions and orientations, this problem has well-known NP-hard complexity. Prior solutions either assume a heuristic packing order or modify the upstream mesh cut and UV mapping to simplify the problem, which either limits the packing ratio or incurs robustness or generality issues. Instead, we introduce a learning-assisted 2D irregular shape packing method that achieves a high packing quality with minimal requirements from the input. Our method iteratively selects and groups subsets of UV patches into near-rectangular super patches, essentially reducing the problem to bin-packing, based on which a joint optimization is employed to further improve the packing ratio. In order to efficiently deal with large problem instances with hundreds of patches, we train deep neural policies to predict nearly rectangular patch subsets and determine their relative poses, leading to linear time scaling with the number of patches. We demonstrate the effectiveness of our method on three datasets for UV packing, where our method achieves a higher packing ratio over several widely used baselines with competitive computational speed.

CVMay 17, 2024Code
TexPainter: Generative Mesh Texturing with Multi-view Consistency

Hongkun Zhang, Zherong Pan, Congyi Zhang et al.

The recent success of pre-trained diffusion models unlocks the possibility of the automatic generation of textures for arbitrary 3D meshes in the wild. However, these models are trained in the screen space, while converting them to a multi-view consistent texture image poses a major obstacle to the output quality. In this paper, we propose a novel method to enforce multi-view consistency. Our method is based on the observation that latent space in a pre-trained diffusion model is noised separately for each camera view, making it difficult to achieve multi-view consistency by directly manipulating the latent codes. Based on the celebrated Denoising Diffusion Implicit Models (DDIM) scheme, we propose to use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation. Our method further relaxes the sequential dependency assumption among the camera views. By evaluating on a series of general 3D models, we find our simple approach improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts. Our implementation is available at: https://github.com/Quantuman134/TexPainter

CVJul 11, 2024
SRPose: Two-view Relative Pose Estimation with Sparse Keypoints

Rui Yin, Yulun Zhang, Zherong Pan et al.

Two-view pose estimation is essential for map-free visual relocalization and object pose tracking tasks. However, traditional matching methods suffer from time-consuming robust estimators, while deep learning-based pose regressors only cater to camera-to-world pose estimation, lacking generalizability to different image sizes and camera intrinsics. In this paper, we propose SRPose, a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios. SRPose consists of a sparse keypoint detector, an intrinsic-calibration position encoder, and promptable prior knowledge-guided attention layers. Given two RGB images of a fixed scene or a moving object, SRPose estimates the relative camera or 6D object pose transformation. Extensive experiments demonstrate that SRPose achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed, showing generalizability to both scenarios. It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.

71.1GRMay 14
Unified Simulation of Lagrangian Particle Dynamics via Transformer

Caoliwen Wang, Minghao Guo, Siyuan Chen et al.

A unified simulator that can model diverse physical phenomena without solver-specific redesign is a long-standing goal across simulation science. We present a learning-based particle simulator built on a single transformer architecture to model cloth, elastic solds, Newtonian and non-Newtonian fluids, granular materials, and molecular dynamics. Our model follows a prediction-correction design on a shared Lagrangian particle representation. An explicit predictor first advances particles under the known external forces, producing an intermediate state that captures externally driven motion but not inter-particle interactions. A learned corrector then predicts the residual position and velocity updates through three stages: a particle tokenizer that encodes local particle-particle, particle-boundary, and topology-guided interactions; a super-token encoder that hierarchically merges particle tokens into a compact set of super tokens via alternating self-attention and token merging; and a super-token decoder that lifts these super tokens back to particle resolution through cross-attention to predict per-particle position and velocity corrections. Progressive token merging reduces the attention cost at successive encoder layers by halving the token count at each level, and the decoder communicates through the compact super-token set rather than full particle-to-particle attention. Across the six dynamics categories, the same architecture generalizes to unseen materials, boundary configurations, initial conditions, and external forces. We further demonstrate downstream interactive control, inverse design, and learning from real-world manipulation data, reducing the need for per-phenomenon solver engineering.

ROFeb 23
Simulation-Ready Cluttered Scene Estimation via Physics-aware Joint Shape and Pose Optimization

Wei-Cheng Huang, Jiaheng Han, Xiaohan Ye et al.

Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks. Regretfully, existing methods struggle in cluttered environments, often exhibiting prohibitive computational cost, poor robustness, and restricted generality when scaling to multiple interacting objects. We propose a unified optimization-based formulation for real-to-sim scene estimation that jointly recovers the shapes and poses of multiple rigid objects under physical constraints. Our method is built on two key technical innovations. First, we leverage the recently introduced shape-differentiable contact model, whose global differentiability permits joint optimization over object geometry and pose while modeling inter-object contacts. Second, we exploit the structured sparsity of the augmented Lagrangian Hessian to derive an efficient linear system solver whose computational cost scales favorably with scene complexity. Building on this formulation, we develop an end-to-end real-to-sim scene estimation pipeline that integrates learning-based object initialization, physics-constrained joint shape-pose optimization, and differentiable texture refinement. Experiments on cluttered scenes with up to 5 objects and 22 convex hulls demonstrate that our approach robustly reconstructs physically valid, simulation-ready object shapes and poses.

35.7GRApr 4
Real-time Neural Six-way Lightmaps

Wei Li, Hanxiao Sun, Tao Huang et al.

Participating media are a pervasive and intriguing visual effect in virtual environments. Unfortunately, rendering such phenomena in real-time is notoriously difficult due to the computational expense of estimating the volume rendering equation. While the six-way lightmaps technique has been widely used in video games to render smoke with a camera-oriented billboard and approximate lighting effects using six precomputed lightmaps, achieving a balance between realism and efficiency, it is limited to pre-simulated animation sequences and is ignorant of camera movement. In this work, we propose a neural six-way lightmaps method to strike a long-sought balance between dynamics and visual realism. Our approach first generates a guiding map from the camera view using ray marching with a large sampling distance to approximate smoke scattering and silhouette. Then, given a guiding map, we train a neural network to predict the corresponding six-way lightmaps. The resulting lightmaps can be seamlessly used in existing game engine pipelines. This approach supports visually appealing rendering effects while enabling real-time user interactivity, including smoke-obstacle interaction, camera movement, and light change. By conducting a series of comprehensive benchmarks, we demonstrate that our method is well-suited for real-time applications, such as games and VR/AR.

ROMay 9, 2025
Physics-informed Temporal Difference Metric Learning for Robot Motion Planning

Ruiqi Ni, Zherong Pan, Ahmed H Qureshi

The motion planning problem involves finding a collision-free path from a robot's starting to its target configuration. Recently, self-supervised learning methods have emerged to tackle motion planning problems without requiring expensive expert demonstrations. They solve the Eikonal equation for training neural networks and lead to efficient solutions. However, these methods struggle in complex environments because they fail to maintain key properties of the Eikonal equation, such as optimal value functions and geodesic distances. To overcome these limitations, we propose a novel self-supervised temporal difference metric learning approach that solves the Eikonal equation more accurately and enhances performance in solving complex and unseen planning tasks. Our method enforces Bellman's principle of optimality over finite regions, using temporal difference learning to avoid spurious local minima while incorporating metric learning to preserve the Eikonal equation's essential geodesic properties. We demonstrate that our approach significantly outperforms existing self-supervised learning methods in handling complex environments and generalizing to unseen environments, with robot configurations ranging from 2 to 12 degrees of freedom (DOF).

RONov 29, 2024
One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering

Yifan Zhu, Tianyi Xiang, Aaron Dollar et al.

Identifying predictive world models for robots in novel environments from sparse online observations is essential for robot task planning and execution in novel environments. However, existing methods that leverage differentiable programming to identify world models are incapable of jointly optimizing the geometry, appearance, and physical properties of the scene. In this work, we introduce a novel rigid object representation that allows the joint identification of these properties. Our method employs a novel differentiable point-based geometry representation coupled with a grid-based appearance field, which allows differentiable object collision detection and rendering. Combined with a differentiable physical simulator, we achieve end-to-end optimization of world models, given the sparse visual and tactile observations of a physical motion sequence. Through a series of world model identification tasks in simulated and real environments, we show that our method can learn both simulation- and rendering-ready world models from only one robot action sequence. The code and additional videos are available at our project website: https://tianyi20.github.io/rigid-world-model.github.io/

GRJun 5, 2025
Handle-based Mesh Deformation Guided By Vision Language Model

Xingpeng Sun, Shiyang Jia, Zherong Pan et al.

Mesh deformation is a fundamental tool in 3D content manipulation. Despite extensive prior research, existing approaches often suffer from low output quality, require significant manual tuning, or depend on data-intensive training. To address these limitations, we introduce a training-free, handle-based mesh deformation method. % Our core idea is to leverage a Vision-Language Model (VLM) to interpret and manipulate a handle-based interface through prompt engineering. We begin by applying cone singularity detection to identify a sparse set of potential handles. The VLM is then prompted to select both the deformable sub-parts of the mesh and the handles that best align with user instructions. Subsequently, we query the desired deformed positions of the selected handles in screen space. To reduce uncertainty inherent in VLM predictions, we aggregate the results from multiple camera views using a novel multi-view voting scheme. % Across a suite of benchmarks, our method produces deformations that align more closely with user intent, as measured by CLIP and GPTEval3D scores, while introducing low distortion -- quantified via membrane energy. In summary, our approach is training-free, highly automated, and consistently delivers high-quality mesh deformations.

CVJun 7, 2024
Varying Manifolds in Diffusion: From Time-varying Geometries to Visual Saliency

Junhao Chen, Manyi Li, Zherong Pan et al.

Deep generative models learn the data distribution, which is concentrated on a low-dimensional manifold. The geometric analysis of distribution transformation provides a better understanding of data structure and enables a variety of applications. In this paper, we study the geometric properties of the diffusion model, whose forward diffusion process and reverse generation process construct a series of distributions on manifolds which vary over time. Our key contribution is the introduction of generation rate, which corresponds to the local deformation of manifold over time around an image component. We show that the generation rate is highly correlated with intuitive visual properties, such as visual saliency, of the image component. Further, we propose an efficient and differentiable scheme to estimate the generation rate for a given image component over time, giving rise to a generation curve. The differentiable nature of our scheme allows us to control the shape of the generation curve via optimization. Using different loss functions, our generation curve matching algorithm provides a unified framework for a range of image manipulation tasks, including semantic transfer, object removal, saliency manipulation, image blending, etc. We conduct comprehensive analytical evaluations to support our findings and evaluate our framework on various manipulation tasks. The results show that our method consistently leads to better manipulation results, compared to recent baselines.

RONov 13, 2021
Robust Multi-Robot Trajectory Optimization Using Alternating Direction Method of Multiplier

Ruiqi Ni, Zherong Pan, Xifeng Gao

We propose a variant of alternating direction method of multiplier (ADMM) to solve constrained trajectory optimization problems. Our ADMM framework breaks a joint optimization into small sub-problems, leading to a low iteration cost and decentralized parameter updates. Starting from a collision-free initial trajectory, our method inherits the theoretical properties of primal interior point method (P-IPM), i.e., guaranteed collision avoidance and homotopy preservation throughout optimization, while being orders of magnitude faster. We have analyzed the convergence and evaluated our method for time-optimal multi-UAV trajectory optimizations and simultaneous goal-reaching of multiple robot arms, where we take into consider kinematics-, dynamics-limits, and homotopy-preserving collision constraints. Our method highlights an order of magnitude's speedup, while generating trajectories of comparable qualities as state-of-the-art P-IPM solver.

CVOct 8, 2021
Active Learning of Neural Collision Handler for Complex 3D Mesh Deformations

Qingyang Tan, Zherong Pan, Breannan Smith et al.

We present a robust learning algorithm to detect and handle collisions in 3D deforming meshes. Our collision detector is represented as a bilevel deep autoencoder with an attention mechanism that identifies colliding mesh sub-parts. We use a numerical optimization algorithm to resolve penetrations guided by the network. Our learned collision handler can resolve collisions for unseen, high-dimensional meshes with thousands of vertices. To obtain stable network performance in such large and unseen spaces, we progressively insert new collision data based on the errors in network inferences. We automatically label these data using an analytical collision detector and progressively fine-tune our detection networks. We evaluate our method for collision handling of complex, 3D meshes coming from several datasets with different shapes and topologies, including datasets corresponding to dressed and undressed human poses, cloth simulations, and human hand poses acquired using multiview capture systems. Our approach outperforms supervised learning methods and achieves $93.8-98.1\%$ accuracy compared to the groundtruth by analytic methods. Compared to prior learning methods, our approach results in a $5.16\%-25.50\%$ lower false negative rate in terms of collision checking and a $9.65\%-58.91\%$ higher success rate in collision handling.

ROSep 8, 2021
Joint Search of Optimal Topology and Trajectory for Planar Linkages

Zherong Pan, Min Liu, Xifeng Gao et al.

We present an algorithm to compute planar linkage topology and geometry, given a user-specified end-effector trajectory. Planar linkage structures convert rotational or prismatic motions of a single actuator into an arbitrarily complex periodic motion, \refined{which is an important component when building low-cost, modular robots, mechanical toys, and foldable structures in our daily lives (chairs, bikes, and shelves). The design of such structures require trial and error even for experienced engineers. Our research provides semi-automatic methods for exploring novel designs given high-level specifications and constraints.} We formulate this problem as a non-smooth numerical optimization with quadratic objective functions and non-convex quadratic constraints involving mixed-integer decision variables (MIQCQP). We propose and compare three approximate algorithms to solve this problem: mixed-integer conic-programming (MICP), mixed-integer nonlinear programming (MINLP), and simulated annealing (SA). We evaluated these algorithms searching for planar linkages involving $10-14$ rigid links. Our results show that the best performance can be achieved by combining MICP and MINLP, leading to a hybrid algorithm capable of finding the planar linkages within a couple of hours on a desktop machine, which significantly outperforms the SA baseline in terms of optimality. We highlight the effectiveness of our optimized planar linkages by using them as legs of a walking robot.

ROAug 7, 2021
Multi-Robot Path Planning in Complex Environments via Graph Embedding

Xifeng Gao, Zherong Pan, Ruiqi Ni

We propose an approach to solve multi-agent path planning (MPP) problems for complex environments. Our method first designs a special pebble graph with a set of feasibility constraints, under which MPP problems have feasibility guarantee. We further propose an algorithm to greedily improve the optimality of planned MPP solutions via parallel pebble motions. As a second step, we develop a mesh optimization algorithm to embed our pebble graph into arbitrarily complex environments. We show that the feasibility constraints of a pebble graph can be converted into differentiable geometric constraints, such that our mesh optimizer can satisfy these constraints via constrained numerical optimization. We have evaluated the effectiveness and efficiency of our method using a set of environments with complex geometries, on which our method achieves an average of 99.0% free-space coverage and 30.3% robot density within hours of computation on a desktop machine.

ROJul 31, 2021
Planning of Power Grasps Using Infinite Program Under Complementary Constraints

Zherong Pan, Duo Zhang, Changhe Tu et al.

We propose an optimization-based approach to plan power grasps. Central to our method is a reformulation of grasp planning as an infinite program under complementary constraints (IPCC), which allows contacts to happen between arbitrary pairs of points on the object and the robot gripper. We show that IPCC can be reduced to a conventional finite-dimensional nonlinear program (NLP) using a kernel-integral relaxation. Moreover, the values and Jacobian matrices of the kernel-integral can be evaluated efficiently using a modified Fast Multipole Method (FMM). We further guarantee that the planned grasps are collision-free using primal barrier penalties. We demonstrate the effectiveness, robustness, and efficiency of our grasp planner on a row of challenging 3D objects and high-DOF grippers, such as Barrett Hand and Shadow Hand, where our method achieves superior grasp qualities over competitors.

ROMar 25, 2021
Optimized Coverage Planning for UV Surface Disinfection

Joao Marcos Correia Marques, Ramya Ramalingam, Zherong Pan et al.

UV radiation has been used as a disinfection strategy to deactivate a wide range of pathogens, but existing irradiation strategies do not ensure sufficient exposure of all environmental surfaces and/or require long disinfection times. We present a near-optimal coverage planner for mobile UV disinfection robots. The formulation optimizes the irradiation time efficiency, while ensuring that a sufficient dosage of radiation is received by each surface. The trajectory and dosage plan are optimized taking collision and light occlusion constraints into account. We propose a two-stage scheme to approximate the solution of the induced NP-hard optimization, and, for efficiency, perform key irradiance and occlusion calculations on a GPU. Empirical results show that our technique achieves more coverage for the same exposure time as strategies for existing UV robots, can be used to compare UV robot designs, and produces near-optimal plans. This is an extended version of the paper originally contributed to ICRA2021.

ROOct 28, 2020
Implicit Integration for Articulated Bodies with Contact via the Nonconvex Maximal Dissipation Principle

Zherong Pan, Kris Hauser

We present non-convex maximal dissipation principle (NMDP), a time integration scheme for articulated bodies with simultaneous contacts. Our scheme resolves contact forces via the maximal dissipation principle (MDP). Prior MDP solvers compute contact forces via convex programming by assuming linearized dynamics integrated using the forward multistep scheme. Instead, we consider the coupled system of nonlinear Newton-Euler dynamics and MDP, which is time-integrated using the backward integration scheme. We show that the coupled system of equations can be solved efficiently using the projected gradient method with guaranteed convergence. We evaluate our method by predicting several locomotion trajectories for a quadruped robot. The results show that our NMDP scheme has several desirable properties including: (1) generalization to novel contact models; (2) superior stability under large timestep sizes; (3) consistent trajectory generation under varying timestep sizes.

ROOct 20, 2020
Decision Making in Joint Push-Grasp Action Space for Large-Scale Object Sorting

Zherong Pan, Kris Hauser

We present a planner for large-scale (un)labeled object sorting tasks, which uses two types of manipulation actions: overhead grasping and planar pushing. The grasping action offers completeness guarantee under mild assumptions, and planar pushing is an acceleration strategy that moves multiple objects at once. Our main contribution is twofold: (1) We propose a bilevel planning algorithm. Our high-level planner makes efficient, near-optimal choices between pushing and grasping actions based on a cost model. Our low-level planner computes one-step greedy pushing or grasping actions. (2) We propose a novel low-level push planner that can find one-step greedy pushing actions in a semi-discrete search space. The structure of the search space allows us to efficient We show that, for sorting up to $200$ objects, our planner can find near-optimal actions with $10$ seconds of computation on a desktop machine.

ROOct 19, 2020
Robust & Asymptotically Locally Optimal UAV-Trajectory Generation Based on Spline Subdivision

Ruiqi Ni, Teseo Schneider, Daniele Panozzo et al.

Generating locally optimal UAV-trajectories is challenging due to the non-convex constraints of collision avoidance and actuation limits. We present the first local, optimization-based UAV-trajectory generator that simultaneously guarantees the validity and asymptotic optimality for known environments. \textit{Validity:} Given a feasible initial guess, our algorithm guarantees the satisfaction of all constraints throughout the process of optimization. \textit{Asymptotic Optimality:} We use an asymptotic exact piecewise approximation of the trajectory with an automatically adjustable resolution of its discretization. The trajectory converges under refinement to the first-order stationary point of the exact non-convex programming problem. Our method has additional practical advantages including joint optimality in terms of trajectory and time-allocation, and robustness to challenging environments as demonstrated in our experiments.

ROFeb 27, 2020
Multi-Robot Path Planning Using Medial-Axis-Based Pebble-Graph Embedding

Liang He, Zherong Pan, Kiril Solovey et al.

We present a centralized algorithm for labeled, disk-shaped Multi-Robot Path Planning (MPP) in a continuous planar workspace with polygonal boundaries. Our method automatically transform the continuous problem into a discrete, graph-based variant termed the pebble motion problem, which can be solved efficiently. To construct the underlying pebble graph, we identify inscribed circles in the workspace via a medial axis transform and organize robots into layers within each inscribed circle. We show that our layered pebble-graph enables collision-free motions, allowing all graph-restricted MPP instances to be feasible. MPP instances with continuous start and goal positions can then be solved via local navigations that route robots from and to graph vertices. We tested our method on several environments with high robot-packing densities (up to $61.6\%$ of the workspace). For environments with narrow passages, such density violates the well-separated assumptions made by state-of-the-art MPP planners, while our method achieves an average success rate of $83\%$.

ROFeb 4, 2020
Deep Differentiable Grasp Planner for High-DOF Grippers

Min Liu, Zherong Pan, Kai Xu et al.

We present an end-to-end algorithm for training deep neural networks to grasp novel objects. Our algorithm builds all the essential components of a grasping system using a forward-backward automatic differentiation approach, including the forward kinematics of the gripper, the collision between the gripper and the target object, and the metric for grasp poses. In particular, we show that a generalized Q1 grasp metric is defined and differentiable for inexact grasps generated by a neural network, and the derivatives of our generalized Q1 metric can be computed from a sensitivity analysis of the induced optimization problem. We show that the derivatives of the (self-)collision terms can be efficiently computed from a watertight triangle mesh of low-quality. Altogether, our algorithm allows for the computation of grasp poses for high-DOF grippers in an unsupervised mode with no ground truth data, or it improves the results in a supervised mode using a small dataset. Our new learning algorithm significantly simplifies the data preparation for learning-based grasping systems and leads to higher qualities of learned grasps on common 3D shape datasets [7, 49, 26, 25], achieving a 22% higher success rate on physical hardware and a 0.12 higher value on the Q1 grasp quality metric.

GRSep 26, 2019
Realtime Simulation of Thin-Shell Deformable Materials using CNN-Based Mesh Embedding

Qingyang Tan, Zherong Pan, Lin Gao et al.

We address the problem of accelerating thin-shell deformable object simulations by dimension reduction. We present a new algorithm to embed a high-dimensional configuration space of deformable objects in a low-dimensional feature space, where the configurations of objects and feature points have approximate one-to-one mapping. Our key technique is a graph-based convolutional neural network (CNN) defined on meshes with arbitrary topologies and a new mesh embedding approach based on physics-inspired loss term. We have applied our approach to accelerate high-resolution thin shell simulations corresponding to cloth-like materials, where the configuration space has tens of thousands of degrees of freedom. We show that our physics-inspired embedding approach leads to higher accuracy compared with prior mesh embedding methods. Finally, we show that the temporal evolution of the mesh in the feature space can also be learned using a recurrent neural network (RNN) leading to fully learnable physics simulators. After training our learned simulator runs $500-10000\times$ faster and the accuracy is high enough for robot manipulation tasks.

ROSep 12, 2019
New Formulation of Mixed-Integer Conic Programming for Globally Optimal Grasp Planning

Min Liu, Zherong Pan, Kai Xu et al.

We present a two-level branch-and-bound (BB) algorithm to compute the optimal gripper pose that maximizes a grasp metric in a restricted search space. Our method can take the gripper's kinematics feasibility into consideration to ensure that a given gripper can reach the set of grasp points without collisions or predict infeasibility with finite-time termination when no pose exists for a given set of grasp points. Our main technical contribution is a novel mixed-integer conic programming (MICP) formulation for the inverse kinematics of the gripper that uses a small number of binary variables and tightened constraints, which can be efficiently solved via a low-level BB algorithm. Our experiments show that optimal gripper poses for various target objects can be computed taking 20-180 minutes of computation on a desktop machine and the computed grasp quality, in terms of the Q1 metric, is better than those generated using sampling-based planners.

ROJul 20, 2019
Generating Optimal Grasps Under A Stress-Minimizing Metric

Zherong Pan, Xifeng Gao, Dinesh Manocha

We present stress-minimizing (SM) metric, a new metric of grasp qualities. Unlike previous metrics that ignore the material of target objects, we assume that target objects are made of homogeneous isotopic materials. SM metric measures the maximal resistible external wrenches without causing fracture in the target objects. Therefore, SM metric is useful for robot grasping valuable and fragile objects. In this paper, we analyze the properties of this new metric, propose grasp planning algorithms to generate globally optimal grasps maximizing the SM metric, and compare the performance of the SM metric and a conventional metric. Our experiments show that SM metric is aware of the geometries of target objects while the conventional metric are not. We also show that the computational cost of the SM metric is on par with that of the conventional metric.

ROMay 22, 2019
Globally Optimal Joint Search of Topology and Trajectory for Planar Linkages

Zherong Pan, Min Liu, Xifeng Gao et al.

We present a method to find globally optimal topology and trajectory jointly for planar linkages. Planar linkage structures can generate complex end-effector trajectories using only a single rotational actuator, which is very useful in building low-cost robots. We address the problem of searching for the optimal topology and geometry of these structures. However, since topology changes are non-smooth and non-differentiable, conventional gradient-based searches cannot be used. We formulate this problem as a mixed-integer convex programming (MICP) problem, for which a global optimum can be found using the branch-and-bound (BB) algorithm. Compared to existing methods, our experiments show that the proposed approach finds complex linkage structures more efficiently and generates end-effector trajectories more accurately.

ROMar 1, 2019
Generating Grasp Poses for a High-DOF Gripper Using Neural Networks

Min Liu, Zherong Pan, Kai Xu et al.

We present a learning-based method for representing grasp poses of a high-DOF hand using neural networks. Due to redundancy in such high-DOF grippers, there exists a large number of equally effective grasp poses for a given target object, making it difficult for the neural network to find consistent grasp poses. We resolve this ambiguity by generating an augmented dataset that covers many possible grasps for each target object and train our neural networks using a consistency loss function to identify a one-to-one mapping from objects to grasp poses. We further enhance the quality of neural-network-predicted grasp poses using a collision loss function to avoid penetrations. We use an object dataset that combines the BigBIRD Database, the KIT Database, the YCB Database, and the Grasp Dataset to show that our method can generate high-DOF grasp poses with higher accuracy than supervised learning baselines. The quality of the grasp poses is on par with the groundtruth poses in the dataset. In addition, our method is robust and can handle noisy object models such as those constructed from multi-view depth images, allowing our method to be implemented on a 25-DOF Shadow Hand hardware platform.

ROSep 21, 2018
Fast Motion Planning for High-DOF Robot Systems Using Hierarchical System Identification

Biao Jia, Zherong Pan, Dinesh Manocha

We present an efficient algorithm for motion planning and control of a robot system with a high number of degrees-of-freedom. These include high-DOF soft robots or an articulated robot interacting with a deformable environment. Our approach takes into account dynamics constraints and present a novel technique to accelerate the forward dynamic computation using a data-driven method. We precompute the forward dynamic function of the robot system on a hierarchical adaptive grid. Furthermore, we exploit the properties of underactuated robot systems and perform these computations for a few DOFs. We provide error bounds for our approximate forward dynamics computation and use our approach for optimization-based motion planning and reinforcement-learning-based feedback control. Our formulation is used for motion planning of two high DOF robot systems: a high-DOF line-actuated elastic robot arm and an underwater swimming robot operating in water. As compared to prior techniques based on exact dynamic function computation, we observe one to two orders of magnitude improvement in performance.

ROJun 25, 2018
Learning-based Feedback Controller for Deformable Object Manipulation

Biao Jia, Zhe Hu, Zherong Pan et al.

In this paper, we present a general learning-based framework to automatically visual-servo control the position and shape of a deformable object with unknown deformation parameters. The servo-control is accomplished by learning a feedback controller that determines the robotic end-effector's movement according to the deformable object's current status. This status encodes the object's deformation behavior by using a set of observed visual features, which are either manually designed or automatically extracted from the robot's sensor stream. A feedback control policy is then optimized to push the object toward a desired featured status efficiently. The feedback policy can be learned either online or offline. Our online policy learning is based on the Gaussian Process Regression (GPR), which can achieve fast and accurate manipulation and is robust to small perturbations. An offline imitation learning framework is also proposed to achieve a control policy that is robust to large perturbations in the human-robot interaction. We validate the performance of our controller on a set of deformable object manipulation tasks and demonstrate that our method can achieve effective and accurate servo-control for general deformable objects with a wide variety of goal settings.

ROFeb 27, 2018
Cloth Manipulation Using Random-Forest-Based Imitation Learning

Biao Jia, Zherong Pan, Zhe Hu et al.

We present a novel approach for robust manipulation of high-DOF deformable objects such as cloth. Our approach uses a random forest-based controller that maps the observed visual features of the cloth to an optimal control action of the manipulator. The topological structure of this random forest-based controller is determined automatically based on the training data consisting visual features and optimal control actions. This enables us to integrate the overall process of training data classification and controller optimization into an imitation learning (IL) approach. Our approach enables learning of robust control policy for cloth manipulation with guarantees on convergence.We have evaluated our approach on different multi-task cloth manipulation benchmarks such as flattening, folding and twisting. In practice, our approach works well with different deformable features learned based on the specific task or deep learning. Moreover, our controller outperforms a simple or piecewise linear controller in terms of robustness to noise. In addition, our approach is easy to implement and does not require much parameter tuning.

ROSep 13, 2017
Time Integrating Articulated Body Dynamics Using Position-Based Collocation Method

Zherong Pan, Dinesh Manocha

We present a new time integrator for articulated body dynamics. We formulate the governing equations of the dynamics using only the position variables and recast the position-based articulated dynamics as an optimization problem. Our reformulation allows us to integrate the dynamics in a fully implicit manner without computing high-order derivatives. Therefore, under arbitrarily large timestep sizes, the stability of our time integration scheme is guaranteed using an off-the-shelf numerical optimizer. In addition to stability, we show that, similar to the Runge-Kutta method, the accuracy of our time integrator can also be increased arbitrarily by using a high-order collocation method. We provide efficient algorithms to perform time integration using our position-based formulation. We show that each iteration of optimization has a complexity of O(N) using Quasi-Newton method or O(N^2) using Newton's method, where N is the number of links. Finally, our method is highly parallelizable and can be accelerated using a Graphics Processing Unit (GPU). We highlight the efficiency and stability of our method on different benchmarks and compare the performance with prior articulated body dynamics simulation methods based on the Newton-Euler's equation. Our method is stable under a timestep size as large as 0.1s. Using a larger timestep size and less timesteps, our method achieves up to 4 times speedup on a single-core CPU. With GPU acceleration, we observe an additional 3-6 times speedup over a 4-core CPU.

ROSep 12, 2016
Feedback Motion Planning for Liquid Transfer using Supervised Learning

Zherong Pan, Dinesh Manocha

We present a novel motion planning algorithm for transferring a liquid body from a source to a target container. Our approach uses a receding-horizon optimization strategy that takes into account fluid constraints and avoids collisions. In order to efficiently handle the high-dimensional configuration space of a liquid body, we use system identification to learn its dynamics characteristics using a neural network. We generate the training dataset using stochastic optimization in a transfer-problem-specific search space. The runtime feedback motion planner is used for real-time planning and we observe high success rate in our simulated 2D and 3D fluid transfer benchmarks.

CVAug 3, 2016
Detailed Garment Recovery from a Single-View Image

Shan Yang, Tanya Ambert, Zherong Pan et al.

Most recent garment capturing techniques rely on acquiring multiple views of clothing, which may not always be readily available, especially in the case of pre-existing photographs from the web. As an alternative, we pro- pose a method that is able to compute a rich and realistic 3D model of a human body and its outfits from a single photograph with little human in- teraction. Our algorithm is not only able to capture the global shape and geometry of the clothing, it can also extract small but important details of cloth, such as occluded wrinkles and folds. Unlike previous methods using full 3D information (i.e. depth, multi-view images, or sampled 3D geom- etry), our approach achieves detailed garment recovery from a single-view image by using statistical, geometric, and physical priors and a combina- tion of parameter estimation, semantic parsing, shape recovery, and physics- based cloth simulation. We demonstrate the effectiveness of our algorithm by re-purposing the reconstructed garments for virtual try-on and garment transfer applications, as well as cloth animation for digital characters.

ROMar 8, 2016
Motion Planning for Fluid Manipulation using Simplified Dynamics

Zherong Pan, Dinesh Manocha

We present an optimization-based motion planning algorithm to compute a smooth, collision-free trajectory for a manipulator used to transfer a liquid from a source to a target container. We take into account fluid dynamics constraints as part of trajectory computation. In order to avoid the high complexity of exact fluid simulation, we introduce a simplified dynamics model based on physically inspired approximations and system identification. Our optimization approach can incorporate various other constraints such as collision avoidance with the obstacles, kinematic and dynamics constraints of the manipulator, and fluid dynamics characteristics. We demonstrate the performance of our planner on different benchmarks corresponding to various obstacles and container shapes. Furthermore, we also evaluate its accuracy by validating the motion plan using an accurate but computationally costly Navier-Stokes fluid simulation.