CVSep 1, 2022
FLAME: Free-form Language-based Motion Synthesis & EditingJihoon Kim, Jiseob Kim, Sungjoon Choi
Text-based motion generation models are drawing a surge of interest for their potential for automating the motion-making process in the game, animation, or robot industries. In this paper, we propose a diffusion-based motion synthesis and editing model named FLAME. Inspired by the recent successes in diffusion models, we integrate diffusion-based generative models into the motion domain. FLAME can generate high-fidelity motions well aligned with the given text. Also, it can edit the parts of the motion, both frame-wise and joint-wise, without any fine-tuning. FLAME involves a new transformer-based architecture we devise to better handle motion data, which is found to be crucial to manage variable-length motions and well attend to free-form text. In experiments, we show that FLAME achieves state-of-the-art generation performances on three text-motion datasets: HumanML3D, BABEL, and KIT. We also demonstrate that editing capability of FLAME can be extended to other tasks such as motion prediction or motion in-betweening, which have been previously covered by dedicated models.
ROJun 17, 2023
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic AgentsJeongeun Park, Seungwon Lim, Joonhyung Lee et al.
In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified as uncertain, we further distinguish it between ambiguous or infeasible commands leveraging LLMs with situational aware context in a zero-shot manner. For ambiguous commands, we disambiguate the command by interacting with users via question generation with LLMs. We believe that proper recognition of the given commands could lead to a decrease in malfunction and undesired actions of the robot, enhancing the reliability of interactive robot agents. We present a dataset for robotic situational awareness, consisting pair of high-level commands, scene descriptions, and labels of command type (i.e., clear, ambiguous, or infeasible). We validate the proposed method on the collected dataset, pick-and-place tabletop simulation. Finally, we demonstrate the proposed approach in real-world human-robot interaction experiments, i.e., handover scenarios.
ROMay 27
Natural Functional Gradients for Smooth Trajectory OptimizationKisang Park, Chanwoo Kim, Kyungjae Lee et al.
Generating collision-free and smooth motions remains a central challenge in robotic manipulation, particularly in cluttered environments and narrow passages where feasible regions are highly constrained and fragmented. We propose a trajectory optimization framework that performs geometry-aware updates directly in function space using natural functional gradients. The method optimizes a Gaussian-smoothed surrogate objective that regularizes the optimization landscape through smooth trajectory perturbations while preserving trajectory-level structure. Because the updates are defined intrinsically in function space, trajectory regularity can be controlled independently of a particular time discretization. We derive a practical Monte-Carlo estimator of the natural functional gradient that requires only black-box trajectory evaluations, making the method applicable when analytic gradients are unavailable or unreliable due to collision checking and contact-rich simulation. Experiments on constrained robotic manipulation tasks demonstrate that the proposed method improves trajectory feasibility and produces smoother motions than representative planning and trajectory optimization baselines in environments with narrow geometric clearances. Additional results, videos, and implementation details are available at the project page: https://kisangpark.github.io/natural-functional-gradient/
ROSep 19, 2022
Zero-shot Active Visual Search (ZAVIS): Intelligent Object Search for Robotic AssistantsJeongeun Park, Taerim Yoon, Jejoon Hong et al.
In this paper, we focus on the problem of efficiently locating a target object described with free-form language using a mobile robot equipped with vision sensors (e.g., an RGBD camera). Conventional active visual search predefines a set of objects to search for, rendering these techniques restrictive in practice. To provide added flexibility in active visual searching, we propose a system where a user can enter target commands using free-form language; we call this system Active Visual Search in the Wild (AVSW). AVSW detects and plans to search for a target object inputted by a user through a semantic grid map represented by static landmarks (e.g., desk or bed). For efficient planning of object search patterns, AVSW considers commonsense knowledge-based co-occurrence and predictive uncertainty while deciding which landmarks to visit first. We validate the proposed method with respect to SR (success rate) and SPL (success weighted by path length) in both simulated and real-world environments. The proposed method outperforms previous methods in terms of SPL in simulated scenarios with an average gap of 0.283. We further demonstrate AVSW with a Pioneer-3AT robot in real-world studies.
ROSep 25, 2023
SPOTS: Stable Placement of Objects with Reasoning in Semi-Autonomous Teleoperation SystemsJoonhyung Lee, Sangbeom Park, Jeongeun Park et al.
Pick-and-place is one of the fundamental tasks in robotics research. However, the attention has been mostly focused on the ``pick'' task, leaving the ``place'' task relatively unexplored. In this paper, we address the problem of placing objects in the context of a teleoperation framework. Particularly, we focus on two aspects of the place task: stability robustness and contextual reasonableness of object placements. Our proposed method combines simulation-driven physical stability verification via real-to-sim and the semantic reasoning capability of large language models. In other words, given place context information (e.g., user preferences, object to place, and current scene information), our proposed method outputs a probability distribution over the possible placement candidates, considering the robustness and reasonableness of the place task. Our proposed method is extensively evaluated in two simulation and one real world environments and we show that our method can greatly increase the physical plausibility of the placement as well as contextual soundness while considering user preferences.
CVOct 27, 2022
Learning Joint Representation of Human Motion and LanguageJihoon Kim, Youngjae Yu, Seungyoun Shin et al.
In this work, we present MoLang (a Motion-Language connecting model) for learning joint representation of human motion and language, leveraging both unpaired and paired datasets of motion and language modalities. To this end, we propose a motion-language model with contrastive learning, empowering our model to learn better generalizable representations of the human motion domain. Empirical results show that our model learns strong representations of human motion data through navigating language modality. Our proposed method is able to perform both action recognition and motion retrieval tasks with a single model where it outperforms state-of-the-art approaches on a number of action recognition benchmarks.
LGFeb 13Code
Formalizing the Sampling Design Space of Diffusion-Based Generative Models via Adaptive Solvers and Wasserstein-Bounded TimestepsSangwoo Jo, Sungjoon Choi
Diffusion-based generative models have achieved remarkable performance across various domains, yet their practical deployment is often limited by high sampling costs. While prior work focuses on training objectives or individual solvers, the holistic design of sampling, specifically solver selection and scheduling, remains dominated by static heuristics. In this work, we revisit this challenge through a geometric lens, proposing SDM, a principled framework that aligns the numerical solver with the intrinsic properties of the diffusion trajectory. By analyzing the ODE dynamics, we show that efficient low-order solvers suffice in early high-noise stages while higher-order solvers can be progressively deployed to handle the increasing non-linearity of later stages. Furthermore, we formalize the scheduling by introducing a Wasserstein-bounded optimization framework. This method systematically derives adaptive timesteps that explicitly bound the local discretization error, ensuring the sampling process remains faithful to the underlying continuous dynamics. Without requiring additional training or architectural modifications, SDM achieves state-of-the-art performance across standard benchmarks, including an FID of 1.93 on CIFAR-10, 2.41 on FFHQ, and 1.98 on AFHQv2, with a reduced number of function evaluations compared to existing samplers. Our code is available at https://github.com/aiimaginglab/sdm.
ROFeb 10, 2023
SOCRATES: Text-based Human Search and Approach using a Robot DogJeongeun Park, Jefferson Silveria, Matthew Pan et al.
In this paper, we propose a SOCratic model for Robots Approaching humans based on TExt System (SOCRATES) focusing on the human search and approach based on free-form textual description; the robot first searches for the target user, then the robot proceeds to approach in a human-friendly manner. In particular, textual descriptions are composed of appearance (e.g., wearing white shirts with black hair) and location clues (e.g., is a student who works with robots). We initially present a Human Search Socratic Model that connects large pre-trained models in the language domain to solve the downstream task, which is searching for the target person based on textual descriptions. Then, we propose a hybrid learning-based framework for generating target-cordial robotic motion to approach a person, consisting of a learning-from-demonstration module and a knowledge distillation module. We validate the proposed searching module via simulation using a virtual mobile robot as well as through real-world experiments involving participants and the Boston Dynamics Spot robot. Furthermore, we analyze the properties of the proposed approaching framework with human participants based on the Robotic Social Attributes Scale (RoSAS)
SENov 13, 2023
Past as a Guide: Leveraging Retrospective Learning for Python Code CompletionSeunggyoon Shin, Seunggyu Chang, Sungjoon Choi
This work presents Past as a Guide (PaG), a simple approach for Large Language Models (LLMs) to improve the coding capabilities by integrating the past history with interactive and iterative code refinements. To be specific, inspired by human cognitive processes, the proposed method enables LLMs to utilize previous programming and debugging experiences to enhance the Python code completion tasks. The framework facilitates LLMs to iteratively refine the Python code based on previous execution and debugging results and optimize learning and reasoning capabilities. The proposed methodology achieved a 92\% pass@1 on HumanEval, demonstrating the potential to advance the field by leveraging retrospection from past experiences and interactive and iterative refinement processes without external correctness indicators.
RODec 3, 2025
Hierarchical Vision Language Action Model Using Success and Failure DemonstrationsJeongeun Park, Jihwan Yoon, Byungwoo Jeon et al.
Prior Vision-Language-Action (VLA) models are typically trained on teleoperated successful demonstrations, while discarding numerous failed attempts that occur naturally during data collection. However, these failures encode where and how policies can be fragile, information that can be exploited to improve robustness. We address this problem by leveraging mixed-quality datasets to learn failure-aware reasoning at planning time. We introduce VINE, a hierarchical vision-language-action model that separates high-level reasoning (System 2) from low-level control (System 1) under a hierarchical reinforcement learning formalism, making failures usable as a structured learning signal rather than noisy supervision. System 2 performs feasibility-guided tree search over a 2D scene-graph abstraction: it proposes subgoal transitions, predicts success probabilities from both successes and failures, and prunes brittle branches before execution, effectively casting plan evaluation as feasibility scoring. The selected subgoal sequence is then passed to System 1, which executes low-level actions without modifying the agent's core skills. Trained entirely from offline teleoperation data, VINE integrates negative experience directly into the decision loop. Across challenging manipulation tasks, this approach consistently improves success rates and robustness, demonstrating that failure data is an essential resource for converting the broad competence of VLAs into robust execution.
ROApr 9
LEGO: Latent-space Exploration for Geometry-aware Optimization of Humanoid Kinematic DesignJihwan Yoon, Taemoon Jeong, Jeongeun Park et al.
Designing robot morphologies and kinematics has traditionally relied on human intuition, with little systematic foundation. Motion-design co-optimization offers a promising path toward automation, but two major challenges remain: (i) the vast, unstructured design space and (ii) the difficulty of constructing task-specific loss functions. We propose a new paradigm that minimizes human involvement by (i) learning the design search space from existing mechanical designs, rather than hand-crafting it, and (ii) defining the loss directly from human motion data via motion retargeting and Procrustes analysis. Using screw-theory-based joint axis representation and isometric manifold learning, we construct a compact, geometry-preserving latent space of humanoid upper body designs in which optimization is tractable. We then solve design optimization in this latent space using gradient-free optimization. Our approach establishes a principled framework for data-driven robot design and demonstrates that leveraging existing designs and human motion can effectively guide the automated discovery of novel robot design.
ROMar 15, 2024
Towards Embedding Dynamic Personas in Interactive Robots: Masquerading Animated Social Kinematics (MASK)Jeongeun Park, Taemoon Jeong, Hyeonseong Kim et al.
This paper presents the design and development of an innovative interactive robotic system to enhance audience engagement using character-like personas. Built upon the foundations of persona-driven dialog agents, this work extends the agent's application to the physical realm, employing robots to provide a more captivating and interactive experience. The proposed system, named the Masquerading Animated Social Kinematic (MASK), leverages an anthropomorphic robot which interacts with guests using non-verbal interactions, including facial expressions and gestures. A behavior generation system based upon a finite-state machine structure effectively conditions robotic behavior to convey distinct personas. The MASK framework integrates a perception engine, a behavior selection engine, and a comprehensive action library to enable real-time, dynamic interactions with minimal human intervention in behavior design. Throughout the user subject studies, we examined whether the users could recognize the intended character in both personality- and film-character-based persona conditions. We conclude by discussing the role of personas in interactive agents and the factors to consider for creating an engaging user experience.
CVMar 19, 2025
3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View TransformationGyeongrok Oh, Sungjune Kim, Heeju Ko et al.
The resolution of voxel queries significantly influences the quality of view transformation in camera-based 3D occupancy prediction. However, computational constraints and the practical necessity for real-time deployment require smaller query resolutions, which inevitably leads to an information loss. Therefore, it is essential to encode and preserve rich visual details within limited query sizes while ensuring a comprehensive representation of 3D occupancy. To this end, we introduce ProtoOcc, a novel occupancy network that leverages prototypes of clustered image segments in view transformation to enhance low-resolution context. In particular, the mapping of 2D prototypes onto 3D voxel queries encodes high-level visual geometries and complements the loss of spatial information from reduced query resolutions. Additionally, we design a multi-perspective decoding strategy to efficiently disentangle the densely compressed visual cues into a high-dimensional 3D occupancy scene. Experimental results on both Occ3D and SemanticKITTI benchmarks demonstrate the effectiveness of the proposed method, showing clear improvements over the baselines. More importantly, ProtoOcc achieves competitive performance against the baselines even with 75\% reduced voxel resolution.
ROApr 5
Learning Dexterous Grasping from Sparse Taxonomy GuidanceJuhan Park, Taerim Yoon, Seungmin Kim et al.
Dexterous manipulation requires planning a grasp configuration suited to the object and task, which is then executed through coordinated multi-finger control. However, specifying grasp plans with dense pose or contact targets for every object and task is impractical. Meanwhile, end-to-end reinforcement learning from task rewards alone lacks controllability, making it difficult for users to intervene when failures occur. To this end, we present GRIT, a two-stage framework that learns dexterous control from sparse taxonomy guidance. GRIT first predicts a taxonomy-based grasp specification from the scene and task context. Conditioned on this sparse command, a policy generates continuous finger motions that accomplish the task while preserving the intended grasp structure. Our result shows that certain grasp taxonomies are more effective for specific object geometries. By leveraging this relationship, GRIT improves generalization to novel objects over baselines and achieves an overall success rate of 87.9%. Moreover, real-world experiments demonstrate controllability, enabling grasp strategies to be adjusted through high-level taxonomy selection based on object geometry and task intent.
CVFeb 9, 2022
Conditional Motion In-betweeningJihoon Kim, Taehyun Byun, Seungyoun Shin et al.
Motion in-betweening (MIB) is a process of generating intermediate skeletal movement between the given start and target poses while preserving the naturalness of the motion, such as periodic footstep motion while walking. Although state-of-the-art MIB methods are capable of producing plausible motions given sparse key-poses, they often lack the controllability to generate motions satisfying the semantic contexts required in practical applications. We focus on the method that can handle pose or semantic conditioned MIB tasks using a unified model. We also present a motion augmentation method to improve the quality of pose-conditioned motion generation via defining a distribution over smooth trajectories. Our proposed method outperforms the existing state-of-the-art MIB method in pose prediction errors while providing additional controllability.
LGNov 2, 2021
Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern EstimationJeongeun Park, Seungyoun Shin, Sangheum Hwang et al.
Robust learning methods aim to learn a clean target distribution from noisy and corrupted training data where a specific corruption pattern is often assumed a priori. Our proposed method can not only successfully learn the clean target distribution from a dirty dataset but also can estimate the underlying noise pattern. To this end, we leverage a mixture-of-experts model that can distinguish two different types of predictive uncertainty, aleatoric and epistemic uncertainty. We show that the ability to estimate the uncertainty plays a significant role in elucidating the corruption patterns as these two objectives are tightly intertwined. We also present a novel validation scheme for evaluating the performance of the corruption pattern estimation. Our proposed method is extensively assessed in terms of both robustness and corruption pattern estimation through a number of domains, including computer vision and natural language processing.
ROSep 27, 2021
Semi-Autonomous Teleoperation via Learning Non-Prehensile Manipulation SkillsSangbeom Park, Yoonbyung Chai, Sunghyun Park et al.
In this paper, we present a semi-autonomous teleoperation framework for a pick-and-place task using an RGB-D sensor. In particular, we assume that the target object is located in a cluttered environment where both prehensile grasping and non-prehensile manipulation are combined for efficient teleoperation. A trajectory-based reinforcement learning is utilized for learning the non-prehensile manipulation to rearrange the objects for enabling direct grasping. From the depth image of the cluttered environment and the location of the goal object, the learned policy can provide multiple options of non-prehensile manipulation to the human operator. We carefully design a reward function for the rearranging task where the policy is trained in a simulational environment. Then, the trained policy is transferred to a real-world and evaluated in a number of real-world experiments with the varying number of objects where we show that the proposed method outperforms manual keyboard control in terms of the time duration for the grasping.
ROSep 23, 2021
Semi-Supervised Imitation Learning with Mixed Qualities of Demonstrations for Autonomous DrivingGunmin Lee, Wooseok Oh, Seungyoun Shin et al.
In this paper, we consider the problem of autonomous driving using imitation learning in a semi-supervised manner. In particular, both labeled and unlabeled demonstrations are leveraged during training by estimating the quality of each unlabeled demonstration. If the provided demonstrations are corrupted and have a low signal-to-noise ratio, the performance of the imitation learning agent can be degraded significantly. To mitigate this problem, we propose a method called semi-supervised imitation learning (SSIL). SSIL first learns how to discriminate and evaluate each state-action pair's reliability in unlabeled demonstrations by assigning higher reliability values to demonstrations similar to labeled expert demonstrations. This reliability value is called leverage. After this discrimination process, both labeled and unlabeled demonstrations with estimated leverage values are utilized while training the policy in a semi-supervised manner. The experimental results demonstrate the validity of the proposed algorithm using unlabeled trajectories with mixed qualities. Moreover, the hardware experiments using an RC car are conducted to show that the proposed method can be applied to real-world applications.
ROSep 16, 2021
Towards Defensive Autonomous Driving: Collecting and Probing Driving Demonstrations of Mixed QualitiesJeongwoo Oh, Gunmin Lee, Jeongeun Park et al.
Designing or learning an autonomous driving policy is undoubtedly a challenging task as the policy has to maintain its safety in all corner cases. In order to secure safety in autonomous driving, the ability to detect hazardous situations, which can be seen as an out-of-distribution (OOD) detection problem, becomes crucial. However, most conventional datasets only provide expert driving demonstrations, although some non-expert or uncommon driving behavior data are needed to implement a safety guaranteed autonomous driving platform. To this end, we present a novel dataset called the R3 Driving Dataset, composed of driving data with different qualities. The dataset categorizes abnormal driving behaviors into eight categories and 369 different detailed situations. The situations include dangerous lane changes and near-collision situations. To further enlighten how these abnormal driving behaviors can be detected, we utilize different uncertainty estimation and anomaly detection methods to the proposed dataset. From the results of the proposed experiment, it can be inferred that by using both uncertainty estimation and anomaly detection, most of the abnormal cases in the proposed dataset can be discriminated. The dataset of this paper can be downloaded from https://rllab-snu.github.io/projects/R3-Driving-Dataset/doc.html.
ROSep 15, 2021
Learning Robot Structure and Motion Embeddings using Graph Neural NetworksJ. Taery Kim, Jeongeun Park, Sungjoon Choi et al.
We propose a learning framework to find the representation of a robot's kinematic structure and motion embedding spaces using graph neural networks (GNN). Finding a compact and low-dimensional embedding space for complex phenomena is a key for understanding its behaviors, which may lead to a better learning performance, as we observed in other domains of images or languages. However, although numerous robotics applications deal with various types of data, the embedding of the generated data has been relatively less studied by roboticists. To this end, our work aims to learn embeddings for two types of robotic data: the robot's design structure, such as links, joints, and their relationships, and the motion data, such as kinematic joint positions. Our method exploits the tree structure of the robot to train appropriate embeddings to the given robot data. To avoid overfitting, we formulate multi-task learning to find a general representation of the embedding spaces. We evaluate the proposed learning method on a robot with a simple linear structure and visualize the learned embeddings using t-SNE. We also study a few design choices of the learning framework, such as network architectures and message passing schemes.
ROMar 11, 2021
Self-Supervised Motion Retargeting with Safety GuaranteeSungjoon Choi, Min Jae Song, Hyemin Ahn et al.
In this paper, we present self-supervised shared latent embedding (S3LE), a data-driven motion retargeting method that enables the generation of natural motions in humanoid robots from motion capture data or RGB videos. While it requires paired data consisting of human poses and their corresponding robot configurations, it significantly alleviates the necessity of time-consuming data-collection via novel paired data generating processes. Our self-supervised learning procedure consists of two steps: automatically generating paired data to bootstrap the motion retargeting, and learning a projection-invariant mapping to handle the different expressivity of humans and humanoid robots. Furthermore, our method guarantees that the generated robot pose is collision-free and satisfies position limits by utilizing nonparametric regression in the shared latent space. We demonstrate that our method can generate expressive robotic motions from both the CMU motion capture database and YouTube videos.
CVDec 16, 2020
Polyblur: Removing mild blur by polynomial reblurringMauricio Delbracio, Ignacio Garcia-Dorado, Sungjoon Choi et al.
We present a highly efficient blind restoration method to remove mild blur in natural images. Contrary to the mainstream, we focus on removing slight blur that is often present, damaging image quality and commonly generated by small out-of-focus, lens blur, or slight camera motion. The proposed algorithm first estimates image blur and then compensates for it by combining multiple applications of the estimated blur in a principled way. To estimate blur we introduce a simple yet robust algorithm based on empirical observations about the distribution of the gradient in sharp natural images. Our experiments show that, in the context of mild blur, the proposed method outperforms traditional and modern blind deblurring methods and runs in a fraction of the time. Our method can be used to blindly correct blur before applying off-the-shelf deep super-resolution methods leading to superior results than other highly complex and computationally demanding techniques. The proposed method estimates and removes mild blur from a 12MP image on a modern mobile phone in a fraction of a second.
ROMay 28, 2018
Interactive Text2Pickup Network for Natural Language based Human-Robot CollaborationHyemin Ahn, Sungjoon Choi, Nuri Kim et al.
In this paper, we propose the Interactive Text2Pickup (IT2P) network for human-robot collaboration which enables an effective interaction with a human user despite the ambiguity in user's commands. We focus on the task where a robot is expected to pick up an object instructed by a human, and to interact with the human when the given instruction is vague. The proposed network understands the command from the human user and estimates the position of the desired object first. To handle the inherent ambiguity in human language commands, a suitable question which can resolve the ambiguity is generated. The user's answer to the question is combined with the initial command and given back to the network, resulting in more accurate estimation. The experiment results show that given unambiguous commands, the proposed method can estimate the position of the requested object with an accuracy of 98.49% based on our test dataset. Given ambiguous language commands, we show that the accuracy of the pick up task increases by 1.94 times after incorporating the information obtained from the interaction.
LGMay 22, 2018
Maximum Causal Tsallis Entropy Imitation LearningKyungjae Lee, Sungjoon Choi, Songhwai Oh
In this paper, we propose a novel maximum causal Tsallis entropy (MCTE) framework for imitation learning which can efficiently learn a sparse multi-modal policy distribution from demonstrations. We provide the full mathematical analysis of the proposed framework. First, the optimal solution of an MCTE problem is shown to be a sparsemax distribution, whose supporting set can be adjusted. The proposed method has advantages over a softmax distribution in that it can exclude unnecessary actions by assigning zero probability. Second, we prove that an MCTE problem is equivalent to robust Bayes estimation in the sense of the Brier score. Third, we propose a maximum causal Tsallis entropy imitation learning (MCTEIL) algorithm with a sparse mixture density network (sparse MDN) by modeling mixture weights using a sparsemax distribution. In particular, we show that the causal Tsallis entropy of an MDN encourages exploration and efficient mixture utilization while Boltzmann Gibbs entropy is less effective. We validate the proposed method in two simulation studies and MCTEIL outperforms existing imitation learning methods in terms of average returns and learning multi-modal policies.
LGMay 16, 2018
Task Agnostic Robust Learning on Corrupt Outputs by Correlation-Guided Mixture Density NetworksSungjoon Choi, Sanghoon Hong, Kyungjae Lee et al.
In this paper, we focus on weakly supervised learning with noisy training data for both classification and regression problems.We assume that the training outputs are collected from a mixture of a target and correlated noise distributions.Our proposed method simultaneously estimates the target distribution and the quality of each data which is defined as the correlation between the target and data generating distributions.The cornerstone of the proposed method is a Cholesky Block that enables modeling dependencies among mixture distributions in a differentiable manner where we maintain the distribution over the network weights.We first provide illustrative examples in both regression and classification tasks to show the effectiveness of the proposed method.Then, the proposed method is extensively evaluated in a number of experiments where we show that it constantly shows comparable or superior performances compared to existing baseline methods in the handling of noisy data.
CVFeb 16, 2018
Fast, Trainable, Multiscale DenoisingSungjoon Choi, John Isidoro, Pascal Getreuer et al.
Denoising is a fundamental imaging problem. Versatile but fast filtering has been demanded for mobile camera systems. We present an approach to multiscale filtering which allows real-time applications on low-powered devices. The key idea is to learn a set of kernels that upscales, filters, and blends patches of different scales guided by local structure analysis. This approach is trainable so that learned filters are capable of treating diverse noise patterns and artifacts. Experimental results show that the presented approach produces comparable results to state-of-the-art algorithms while processing time is orders of magnitude faster.
CVNov 29, 2017
BLADE: Filter Learning for General Purpose Computational PhotographyPascal Getreuer, Ignacio Garcia-Dorado, John Isidoro et al.
The Rapid and Accurate Image Super Resolution (RAISR) method of Romano, Isidoro, and Milanfar is a computationally efficient image upscaling method using a trained set of filters. We describe a generalization of RAISR, which we name Best Linear Adaptive Enhancement (BLADE). This approach is a trainable edge-adaptive filtering framework that is general, simple, computationally efficient, and useful for a wide range of problems in computational photography. We show applications to operations which may appear in a camera pipeline including denoising, demosaicing, and stylization.
LGSep 19, 2017
Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement LearningKyungjae Lee, Sungjoon Choi, Songhwai Oh
In this paper, a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization is proposed.The proposed policy regularization induces a sparse and multi-modal optimal policy distribution of a sparse MDP. The full mathematical analysis of the proposed sparse MDP is provided.We first analyze the optimality condition of a sparse MDP. Then, we propose a sparse value iteration method which solves a sparse MDP and then prove the convergence and optimality of sparse value iteration using the Banach fixed point theorem. The proposed sparse MDP is compared to soft MDPs which utilize causal entropy regularization. We show that the performance error of a sparse MDP has a constant bound, while the error of a soft MDP increases logarithmically with respect to the number of actions, where this performance error is caused by the introduced regularization term. In experiments, we apply sparse MDPs to reinforcement learning problems. The proposed method outperforms existing methods in terms of the convergence speed and performance.
ROSep 11, 2017
A Nonparametric Motion Flow Model for Human Robot CooperationSungjoon Choi, Kyungjae Lee, H. Andy Park et al.
In this paper, we present a novel nonparametric motion flow model that effectively describes a motion trajectory of a human and its application to human robot cooperation. To this end, motion flow similarity measure which considers both spatial and temporal properties of a trajectory is proposed by utilizing the mean and variance functions of a Gaussian process. We also present a human robot cooperation method using the proposed motion flow model. Given a set of interacting trajectories of two workers, the underlying reward function of cooperating behaviors is optimized by using the learned motion description as an input to the reward function where a stochastic trajectory optimization method is used to control a robot. The presented human robot cooperation method is compared with the state-of-the-art algorithm, which utilizes a mixture of interaction primitives (MIP), in terms of the RMS error between generated and target trajectories. While the proposed method shows comparable performance with the MIP when the full observation of human demonstrations is given, it shows superior performance with respect to given partial trajectory information.
CVSep 3, 2017
Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance ModelingSungjoon Choi, Kyungjae Lee, Sungbin Lim et al.
In this paper, we propose an uncertainty-aware learning from demonstration method by presenting a novel uncertainty estimation method utilizing a mixture density network appropriate for modeling complex and noisy human behaviors. The proposed uncertainty acquisition can be done with a single forward path without Monte Carlo sampling and is suitable for real-time robotics applications. The properties of the proposed uncertainty measure are analyzed through three different synthetic examples, absence of data, heavy measurement noise, and composition of functions scenarios. We show that each case can be distinguished using the proposed uncertainty measure and presented an uncertainty-aware learn- ing from demonstration method of an autonomous driving using this property. The proposed uncertainty-aware learning from demonstration method outperforms other compared methods in terms of safety using a complex real-world driving dataset.
CVMar 31, 2017
Unsupervised Holistic Image Generation from Key Local PatchesDonghoon Lee, Sangdoo Yun, Sungjoon Choi et al.
We introduce a new problem of generating an image based on a small number of key local patches without any geometric prior. In this work, key local patches are defined as informative regions of the target object or scene. This is a challenging problem since it requires generating realistic images and predicting locations of parts at the same time. We construct adversarial networks to tackle this problem. A generator network generates a fake image as well as a mask based on the encoder-decoder framework. On the other hand, a discriminator network aims to detect fake images. The network is trained with three losses to consider spatial, appearance, and adversarial information. The spatial loss determines whether the locations of predicted parts are correct. Input patches are restored in the output image without much modification due to the appearance loss. The adversarial loss ensures output images are realistic. The proposed network is trained without supervisory signals since no labels of key parts are required. Experimental results on six datasets demonstrate that the proposed algorithm performs favorably on challenging objects and scenes.
ROAug 12, 2016
Density Matching Reward LearningSungjoon Choi, Kyungjae Lee, Andy Park et al.
In this paper, we focus on the problem of inferring the underlying reward function of an expert given demonstrations, which is often referred to as inverse reinforcement learning (IRL). In particular, we propose a model-free density-based IRL algorithm, named density matching reward learning (DMRL), which does not require model dynamics. The performance of DMRL is analyzed theoretically and the sample complexity is derived. Furthermore, the proposed DMRL is extended to handle nonlinear IRL problems by assuming that the reward function is in the reproducing kernel Hilbert space (RKHS) and kernel DMRL (KDMRL) is proposed. The parameters for KDMRL can be computed analytically, which greatly reduces the computation time. The performance of KDMRL is extensively evaluated in two sets of experiments: grid world and track driving experiments. In grid world experiments, the proposed KDMRL method is compared with both model-based and model-free IRL methods and shows superior performance on a nonlinear reward setting and competitive performance on a linear reward setting in terms of expected value differences. Then we move on to more realistic experiments of learning different driving styles for autonomous navigation in complex and dynamic tracks using KDMRL and receding horizon control.
CVFeb 8, 2016
A Large Dataset of Object ScansSungjoon Choi, Qian-Yi Zhou, Stephen Miller et al.
We have created a dataset of more than ten thousand 3D scans of real objects. To create the dataset, we recruited 70 operators, equipped them with consumer-grade mobile 3D scanning setups, and paid them to scan objects in their environments. The operators scanned objects of their choosing, outside the laboratory and without direct supervision by computer vision professionals. The result is a large and diverse collection of object scans: from shoes, mugs, and toys to grand pianos, construction vehicles, and large outdoor sculptures. We worked with an attorney to ensure that data acquisition did not violate privacy constraints. The acquired data was irrevocably placed in the public domain and is available freely at http://redwood-data.org/3dscan .