Seungsu Kim

LG
h-index22
5papers
59citations
Novelty55%
AI Score44

5 Papers

51.8ROMay 27
ProgVLA: Progress-Aware Robot Manipulation Skill Learning

Seungsu Kim, Jinyoung Choi, Seungmin Baek et al.

We present ProgVLA, a compact vision-language-action (VLA) model designed for reliable robot manipulation under tight compute and memory budgets. The model specifically focuses on efficiently processing long multi-modal sequences by maintaining an explicit representation of task progress over extended horizons. To this end, ProgVLA integrates two key components. First, a multi-modal encoder with a two-stage Perceiver resampling scheme compresses variable-length visual, language, and proprioceptive streams into a fixed set of control-ready context tokens, substantially reducing sequence length while preserving cross-modal grounding. Second, an auxiliary set of progress heads is trained with offline reinforcement learning (RL) objectives to jointly learn critics over normalized remaining-horizon targets. This provides the policy with an internal estimate of task progress and enables advantage- and success-weighted flow-matching imitation learning. On two well-established multi-task robot manipulation benchmarks, a 0.1B-parameter ProgVLA model reaches success rates that are competitive with, and on long-horizon and harder task tiers exceed, substantially larger pretrained baselines. Ablations indicate that the learned context resampler and task-adaptive visual fine-tuning are the largest single contributors, while progress-aware training provides a consistent additional gain that is concentrated on long-horizon and multi-object tasks. We further validate the approach in real-world toy-kitchen environments.

57.3LGMay 12
Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

Alberta Longhini, David Emukpere, Jean-Michel Renders et al.

We address the problem of fine-tuning pre-trained generative policies with reinforcement learning (RL) while preserving the multimodality of their action distributions. Existing methods for RL fine-tuning of generative policies (e.g., diffusion policies) improve task performance but often collapse diverse behaviors into a single reward-maximizing mode. To mitigate this issue, we propose an unsupervised mode discovery framework that uncovers latent behavioral modes within generative policies. The discovered modes enable the use of mutual information as an intrinsic reward, regularizing RL fine-tuning to enhance task success while maintaining behavioral diversity. Experiments on robotic manipulation tasks demonstrate that our method consistently outperforms conventional fine-tuning approaches, achieving higher success rates and preserving richer multimodal action distributions.

CVMar 14, 2025
Disentangled Object-Centric Image Representation for Robotic Manipulation

David Emukpere, Romain Deffayet, Bingbing Wu et al.

Learning robotic manipulation skills from vision is a promising approach for developing robotics applications that can generalize broadly to real-world scenarios. As such, many approaches to enable this vision have been explored with fruitful results. Particularly, object-centric representation methods have been shown to provide better inductive biases for skill learning, leading to improved performance and generalization. Nonetheless, we show that object-centric methods can struggle to learn simple manipulation skills in multi-object environments. Thus, we propose DOCIR, an object-centric framework that introduces a disentangled representation for objects of interest, obstacles, and robot embodiment. We show that this approach leads to state-of-the-art performance for learning pick and place skills from visual inputs in multi-object environments and generalizes at test time to changing objects of interest and distractors in the scene. Furthermore, we show its efficacy both in simulation and zero-shot transfer to the real world.

LGMay 31, 2019
Scaffold-based molecular design using graph generative model

Jaechang Lim, Sang-Yeon Hwang, Seungsu Kim et al.

Searching new molecules in areas like drug discovery often starts from the core structures of candidate molecules to optimize the properties of interest. The way as such has called for a strategy of designing molecules retaining a particular scaffold as a substructure. On this account, our present work proposes a scaffold-based molecular generative model. The model generates molecular graphs by extending the graph of a scaffold through sequential additions of vertices and edges. In contrast to previous related models, our model guarantees the generated molecules to retain the given scaffold with certainty. Our evaluation of the model using unseen scaffolds showed the validity, uniqueness, and novelty of generated molecules as high as the case using seen scaffolds. This confirms that the model can generalize the learned chemical rules of adding atoms and bonds rather than simply memorizing the mapping from scaffolds to molecules during learning. Furthermore, despite the restraint of fixing core structures, our model could simultaneously control multiple molecular properties when generating new molecules.

ROJan 3, 2019
From exploration to control: learning object manipulation skills through novelty search and local adaptation

Seungsu Kim, Alexandre Coninx, Stephane Doncieux

Programming a robot to deal with open-ended tasks remains a challenge, in particular if the robot has to manipulate objects. Launching, grasping, pushing or any other object interaction can be simulated but the corresponding models are not reversible and the robot behavior thus cannot be directly deduced. These behaviors are hard to learn without a demonstration as the search space is large and the reward sparse. We propose a method to autonomously generate a diverse repertoire of simple object interaction behaviors in simulation. Our goal is to bootstrap a robot learning and development process with limited information about what the robot has to achieve and how. This repertoire can be exploited to solve different tasks in reality thanks to a proposed adaptation method or could be used as a training set for data-hungry algorithms. The proposed approach relies on the definition of a goal space and generates a repertoire of trajectories to reach attainable goals, thus allowing the robot to control this goal space. The repertoire is built with an off-the-shelf simulation thanks to a quality diversity algorithm. The result is a set of solutions tested in simulation only. It may result in two different problems: (1) as the repertoire is discrete and finite, it may not contain the trajectory to deal with a given situation or (2) some trajectories may lead to a behavior in reality that differs from simulation because of a reality gap. We propose an approach to deal with both issues by using a local linearization of the mapping between the motion parameters and the observed effects. Furthermore, we present an approach to update the existing solutions repertoire with the tests done on the real robot. The approach has been validated on two different experiments on the Baxter robot: a ball launching and a joystick manipulation tasks.