CVMay 28
ParCo-SDF: Learning Prior-Free Partial-to-Complete Signed Distance Fields of Deformable ObjectsDeokmin Hwang, Minseok Song, Daehyung Park
This study addresses the partial-to-complete geometry reconstruction of deformable objects (DOs) from point-cloud observations toward precise DO manipulation. Recent DO reconstruction approaches often adopt implicit neural representations (INRs) to model continuous surfaces as well as capture structural variability. However, these methods typically rely on object-specific shape priors that improve training stability and limit generalization. To figure it out, we introduce ParCo-SDF, a two-stage partial-to-complete signed distance field (SDF) reconstruction framework consisting of temporal geometry encoding followed by FiLM-conditioned SDF prediction. The temporal encoder captures structural similarity across DO sequence, enabling prior-free stable training. FiLM-based conditioning preserves reconstruction expressivity while reducing network complexity. We evaluate the proposed method against a state-of-the-art DO surface reconstruction baseline on a rubber band manipulation dataset, demonstrating robust and high-fidelity reconstruction under severe occlusions.
CVMay 12
Dynamic Full-body Motion Agent with Object Interaction via Blending Pre-trained Modular ControllersSanghyeok Nam, Byoungjun Kim, Daehyung Park et al.
Generating physically plausible dynamic motions of human-object interaction (HOI) remains challenging, mainly due to existing HOI datasets limited to static interactions, and pretrained agents capable of either dynamic full-body motions without objects or static HOI motions. Recent works such as InsActor and CLoSD generate HOI motions in planning and execution stages, are yet limited to either static or short-term contacts e.g. striking. In this work, we propose a framework that fulfills dynamic and long-term interaction motions such as running while holding a table, by combining pretrained motion priors and imitation agents in planning and execution stages. In the planning stage, we augment HOI datasets with dynamic priors from a pretrained human motion diffusion model, followed by object trajectory generation. This plans dynamic HOI sequences. In the execution stage, a composer network blends actions of pretrained imitation agents specialized either for dynamic human motions or static HOI motions, enabling spatio-temporal composition of their complementary skills. Our method over relevant prior-arts consistently improves success rates while maintaining interaction for dynamic HOI tasks. Furthermore, blending pretrained experts with our composer achieves competitive performance in significantly reduced training time. Ablation studies validate the effectiveness of our augmentation and composer blending.
ROFeb 6
SuReNav: Superpixel Graph-based Constraint Relaxation for Navigation in Over-constrained EnvironmentsKeonyoung Koh, Moonkyeong Jung, Samuel Seungsup Lee et al.
We address the over-constrained planning problem in semi-static environments. The planning objective is to find a best-effort solution that avoids all hard constraint regions while minimally traversing the least risky areas. Conventional methods often rely on pre-defined area costs, limiting generalizations. Further, the spatial continuity of navigation spaces makes it difficult to identify regions that are passable without overestimation. To overcome these challenges, we propose SuReNav, a superpixel graph-based constraint relaxation and navigation method that imitates human-like safe and efficient navigation. Our framework consists of three components: 1) superpixel graph map generation with regional constraints, 2) regional-constraint relaxation using graph neural network trained on human demonstrations for safe and efficient navigation, and 3) interleaving relaxation, planning, and execution for complete navigation. We evaluate our method against state-of-the-art baselines on 2D semantic maps and 3D maps from OpenStreetMap, achieving the highest human-likeness score of complete navigation while maintaining a balanced trade-off between efficiency and safety. We finally demonstrate its scalability and generalization performance in real-world urban navigation with a quadruped robot, Spot.
ROMay 9
A Visuo-Tactile Data Collection System with Haptic Feedback for Coarse-to-Fine Imitation LearningYeseung Kim, Nayoung Oh, Jun Park et al.
We present a visuo-tactile data-collection system that generates temporally structured, contact-rich demonstrations for imitation learning. Conventional systems often decouple the operator from contact forces, which hinders the demonstration of subtle force modulation. Our system introduces a direct-drive gripper that the operator actuates with the fingers, preserving natural haptic feedback. Integrated visual sensors and custom tactile arrays capture image streams and contact geometry. A handle-mounted push button enables the operator to annotate the task's temporal structure in real time by marking task-critical regions. By fusing in-hand force perception with in-situ temporal annotation, the system produces multimodal datasets designed for coarse-to-fine learning algorithms that exploit structural task knowledge, enabling the development of high-quality manipulation policies.
ROFeb 2, 2024
LINGO-Space: Language-Conditioned Incremental Grounding for SpaceDohyun Kim, Nayoung Oh, Deokmin Hwang et al.
We aim to solve the problem of spatially localizing composite instructions referring to space: space grounding. Compared to current instance grounding, space grounding is challenging due to the ill-posedness of identifying locations referred to by discrete expressions and the compositional ambiguity of referring expressions. Therefore, we propose a novel probabilistic space-grounding methodology (LINGO-Space) that accurately identifies a probabilistic distribution of space being referred to and incrementally updates it, given subsequent referring expressions leveraging configurable polar distributions. Our evaluations show that the estimation using polar distributions enables a robot to ground locations successfully through $20$ table-top manipulation benchmark tests. We also show that updating the distribution helps the grounding method accurately narrow the referring space. We finally demonstrate the robustness of the space grounding with simulated manipulation and real quadruped robot navigation tasks. Code and videos are available at https://lingo-space.github.io.
ROMay 1, 2025
Implicit Neural-Representation Learning for Elastic Deformable-Object ManipulationsMinseok Song, JeongHo Ha, Bonggyeong Park et al.
We aim to solve the problem of manipulating deformable objects, particularly elastic bands, in real-world scenarios. However, deformable object manipulation (DOM) requires a policy that works on a large state space due to the unlimited degree of freedom (DoF) of deformable objects. Further, their dense but partial observations (e.g., images or point clouds) may increase the sampling complexity and uncertainty in policy learning. To figure it out, we propose a novel implicit neural-representation (INR) learning for elastic DOMs, called INR-DOM. Our method learns consistent state representations associated with partially observable elastic objects reconstructing a complete and implicit surface represented as a signed distance function. Furthermore, we perform exploratory representation fine-tuning through reinforcement learning (RL) that enables RL algorithms to effectively learn exploitable representations while efficiently obtaining a DOM policy. We perform quantitative and qualitative analyses building three simulated environments and real-world manipulation studies with a Franka Emika Panda arm. Videos are available at http://inr-dom.github.io.
ROMar 26, 2021
Reactive Task and Motion Planning under Temporal Logic SpecificationsShen Li, Daehyung Park, Yoonchang Sung et al.
We present a task-and-motion planning (TAMP) algorithm robust against a human operator's cooperative or adversarial interventions. Interventions often invalidate the current plan and require replanning on the fly. Replanning can be computationally expensive and often interrupts seamless task execution. We introduce a dynamically reconfigurable planning methodology with behavior tree-based control strategies toward reactive TAMP, which takes the advantage of previous plans and incremental graph search during temporal logic-based reactive synthesis. Our algorithm also shows efficient recovery functionalities that minimize the number of replanning steps. Finally, our algorithm produces a robust, efficient, and complete TAMP solution. Our experimental results show the algorithm results in superior manipulation performance in both simulated and real-world tasks.
ROApr 7, 2019
Active Robot-Assisted Feeding with a General-Purpose Mobile Manipulator: Design, Evaluation, and Lessons LearnedDaehyung Park, Yuuna Hoshi, Harshal P. Mahajan et al.
Eating is an essential activity of daily living (ADL) for staying healthy and living at home independently. Although numerous assistive devices have been introduced, many people with disabilities are still restricted from independent eating due to the devices' physical or perceptual limitations. In this work, we present a new meal-assistance system and evaluations of this system with people with motor impairments. We also discuss learned lessons and design insights based on the evaluations. The meal-assistance system uses a general-purpose mobile manipulator, a Willow Garage PR2, which has the potential to serve as a versatile form of assistive technology. Our active feeding framework enables the robot to autonomously deliver food to the user's mouth, reducing the need for head movement by the user. The user interface, visually-guided behaviors, and safety tools allow people with severe motor impairments to successfully use the system. We evaluated our system with a total of 10 able-bodied participants and 9 participants with motor impairments. Both groups of participants successfully ate various foods using the system and reported high rates of success for the system's autonomous behaviors. In general, participants who operated the system reported that it was comfortable, safe, and easy-to-use.
ROApr 21, 2018
3D Human Pose Estimation on a Configurable Bed from a Pressure ImageHenry M. Clever, Ariel Kapusta, Daehyung Park et al.
Robots have the potential to assist people in bed, such as in healthcare settings, yet bedding materials like sheets and blankets can make observation of the human body difficult for robots. A pressure-sensing mat on a bed can provide pressure images that are relatively insensitive to bedding materials. However, prior work on estimating human pose from pressure images has been restricted to 2D pose estimates and flat beds. In this work, we present two convolutional neural networks to estimate the 3D joint positions of a person in a configurable bed from a single pressure image. The first network directly outputs 3D joint positions, while the second outputs a kinematic model that includes estimated joint angles and limb lengths. We evaluated our networks on data from 17 human participants with two bed configurations: supine and seated. Our networks achieved a mean joint position error of 77 mm when tested with data from people outside the training set, outperforming several baselines. We also present a simple mechanical model that provides insight into ambiguity associated with limbs raised off of the pressure mat, and demonstrate that Monte Carlo dropout can be used to estimate pose confidence in these situations. Finally, we provide a demonstration in which a mobile manipulator uses our network's estimated kinematic model to reach a location on a person's body in spite of the person being seated in a bed and covered by a blanket.
RONov 2, 2017
A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-based Variational AutoencoderDaehyung Park, Yuuna Hoshi, Charles C. Kemp
The detection of anomalous executions is valuable for reducing potential hazards in assistive manipulation. Multimodal sensory signals can be helpful for detecting a wide range of anomalies. However, the fusion of high-dimensional and heterogeneous modalities is a challenging problem. We introduce a long short-term memory based variational autoencoder (LSTM-VAE) that fuses signals and reconstructs their expected distribution. We also introduce an LSTM-VAE-based detector using a reconstruction-based anomaly score and a state-based threshold. For evaluations with 1,555 robot-assisted feeding executions including 12 representative types of anomalies, our detector had a higher area under the receiver operating characteristic curve (AUC) of 0.8710 than 5 other baseline detectors from the literature. We also show the multimodal fusion through the LSTM-VAE is effective by comparing our detector with 17 raw sensory signals versus 4 hand-engineered features.
ROMay 25, 2016
Towards Assistive Feeding with a General-Purpose Mobile ManipulatorDaehyung Park, You Keun Kim, Zackory M. Erickson et al.
General-purpose mobile manipulators have the potential to serve as a versatile form of assistive technology. However, their complexity creates challenges, including the risk of being too difficult to use. We present a proof-of-concept robotic system for assistive feeding that consists of a Willow Garage PR2, a high-level web-based interface, and specialized autonomous behaviors for scooping and feeding yogurt. As a step towards use by people with disabilities, we evaluated our system with 5 able-bodied participants. All 5 successfully ate yogurt using the system and reported high rates of success for the system's autonomous behaviors. Also, Henry Evans, a person with severe quadriplegia, operated the system remotely to feed an able-bodied person. In general, people who operated the system reported that it was easy to use, including Henry. The feeding system also incorporates corrective actions designed to be triggered either autonomously or by the user. In an offline evaluation using data collected with the feeding system, a new version of our multimodal anomaly detection system outperformed prior versions.