ROOct 13, 2022
Sample Efficient Dynamics Learning for Symmetrical Legged Robots:Leveraging Physics Invariance and Geometric SymmetriesJee-eun Lee, Jaemin Lee, Tirthankar Bandyopadhyay et al.
Model generalization of the underlying dynamics is critical for achieving data efficiency when learning for robot control. This paper proposes a novel approach for learning dynamics leveraging the symmetry in the underlying robotic system, which allows for robust extrapolation from fewer samples. Existing frameworks that represent all data in vector space fail to consider the structured information of the robot, such as leg symmetry, rotational symmetry, and physics invariance. As a result, these schemes require vast amounts of training data to learn the system's redundant elements because they are learned independently. Instead, we propose considering the geometric prior by representing the system in symmetrical object groups and designing neural network architecture to assess invariance and equivariance between the objects. Finally, we demonstrate the effectiveness of our approach by comparing the generalization to unseen data of the proposed model and the existing models. We also implement a controller of a climbing robot based on learned inverse dynamics models. The results show that our method generates accurate control inputs that help the robot reach the desired state while requiring less training data than existing methods.
CVNov 20, 2023
A Multi-In-Single-Out Network for Video Frame Interpolation without Optical FlowJaemin Lee, Minseok Seo, Sangwoo Lee et al.
In general, deep learning-based video frame interpolation (VFI) methods have predominantly focused on estimating motion vectors between two input frames and warping them to the target time. While this approach has shown impressive performance for linear motion between two input frames, it exhibits limitations when dealing with occlusions and nonlinear movements. Recently, generative models have been applied to VFI to address these issues. However, as VFI is not a task focused on generating plausible images, but rather on predicting accurate intermediate frames between two given frames, performance limitations still persist. In this paper, we propose a multi-in-single-out (MISO) based VFI method that does not rely on motion vector estimation, allowing it to effectively model occlusions and nonlinear motion. Additionally, we introduce a novel motion perceptual loss that enables MISO-VFI to better capture the spatio-temporal correlations within the video frames. Our MISO-VFI method achieves state-of-the-art results on VFI benchmarks Vimeo90K, Middlebury, and UCF101, with a significant performance gap compared to existing approaches.
CVNov 26, 2025
SurgMLLMBench: A Multimodal Large Language Model Benchmark Dataset for Surgical Scene UnderstandingTae-Min Choi, Tae Kyeong Jeong, Garam Kim et al.
Recent advances in multimodal large language models (LLMs) have highlighted their potential for medical and surgical applications. However, existing surgical datasets predominantly adopt a Visual Question Answering (VQA) format with heterogeneous taxonomies and lack support for pixel-level segmentation, limiting consistent evaluation and applicability. We present SurgMLLMBench, a unified multimodal benchmark explicitly designed for developing and evaluating interactive multimodal LLMs for surgical scene understanding, including the newly collected Micro-surgical Artificial Vascular anastomosIS (MAVIS) dataset. It integrates pixel-level instrument segmentation masks and structured VQA annotations across laparoscopic, robot-assisted, and micro-surgical domains under a unified taxonomy, enabling comprehensive evaluation beyond traditional VQA tasks and richer visual-conversational interactions. Extensive baseline experiments show that a single model trained on SurgMLLMBench achieves consistent performance across domains and generalizes effectively to unseen datasets. SurgMLLMBench will be publicly released as a robust resource to advance multimodal surgical AI research, supporting reproducible evaluation and development of interactive surgical reasoning models.
AINov 25, 2025Code
CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI AgentsHaebin Seong, Sungmin Kim, Yongjun Cho et al.
While current navigation benchmarks prioritize task success in simplified settings, they neglect the multidimensional economic constraints essential for the real-world commercialization of autonomous delivery systems. We introduce CostNav, an Economic Navigation Benchmark that evaluates physical AI agents through comprehensive economic cost-revenue analysis aligned with real-world business operations. By integrating industry-standard data - such as SEC filings and AIS injury reports - with Isaac Sim's detailed collision and cargo dynamics, CostNav transcends simple task completion to accurately evaluate business value in complex, real-world scenarios. To our knowledge, CostNav is the first work to quantitatively expose the gap between navigation research metrics and commercial viability, revealing that optimizing for task success on a simplified task fundamentally differs from optimizing for real-world economic deployment. Our evaluation of rule-based Nav2 navigation shows that current approaches are not economically viable: the contribution margin is -22.81/run (AMCL) and -12.87/run (GPS), resulting in no break-even point. We challenge the community to develop navigation policies that achieve economic viability on CostNav. We remain method-agnostic, evaluating success solely on the metric of cost rather than the underlying architecture. All resources are available at https://github.com/worv-ai/CostNav.
CVJul 18, 2022
Towards the Human Global Context: Does the Vision-Language Model Really Judge Like a Human Being?Sangmyeong Woh, Jaemin Lee, Ho Joong Kim et al.
As computer vision and NLP make progress, Vision-Language(VL) is becoming an important area of research. Despite the importance, evaluation metrics of the research domain is still at a preliminary stage of development. In this paper, we propose a quantitative metric "Equivariance Score" and evaluation dataset "Human Puzzle" to assess whether a VL model is understanding an image like a human. We observed that the VL model does not interpret the overall context of an input image but instead shows biases toward a specific object or shape that forms the local context. We aim to quantitatively measure a model's performance in understanding context. To verify the current existing VL model's capability, we sliced the original input image into pieces and randomly placed them, distorting the global context of the image. Our paper discusses each VL model's level of interpretation on global context and addresses how the structural characteristics influenced the results.
CVAug 10, 2021
Exploiting Features with Split-and-Share ModuleJaemin Lee, Minseok Seo, Jongchan Park et al.
Deep convolutional neural networks (CNNs) have shown state-of-the-art performances in various computer vision tasks. Advances on CNN architectures have focused mainly on designing convolutional blocks of the feature extractors, but less on the classifiers that exploit extracted features. In this work, we propose Split-and-Share Module (SSM),a classifier that splits a given feature into parts, which are partially shared by multiple sub-classifiers. Our intuition is that the more the features are shared, the more common they will become, and SSM can encourage such structural characteristics in the split features. SSM can be easily integrated into any architecture without bells and whistles. We have extensively validated the efficacy of SSM on ImageNet-1K classification task, andSSM has shown consistent and significant improvements over baseline architectures. In addition, we analyze the effect of SSM using the Grad-CAM visualization.
ROSep 13, 2020
MPC-Based Hierarchical Task Space Control of Underactuated and Constrained Robots for Execution of Multiple TasksJaemin Lee, Seung Hyeon Bang, Efstathios Bakolas et al.
This paper proposes an MPC-based controller to efficiently execute multiple hierarchical tasks for underactuated and constrained robotic systems. Existing task-space controllers or whole-body controllers solve instantaneous optimization problems given task trajectories and the robot plant dynamics. However, the task-space control method we propose here relies on the prediction of future state trajectories and the corresponding costs-to-go terms over a finite time-horizon for computing control commands. We employ acceleration energy error as the performance index for the optimization problem and extend it over the finite-time horizon of our MPC. Our approach employs quadratically constrained quadratic programming, which includes quadratic constraints to handle multiple hierarchical tasks, and is computationally more efficient than nonlinear MPC-based approaches that rely on nonlinear programming. We validate our approach using numerical simulations of a new type of robot manipulator system, which contains underactuated and constrained mechanical structures.
ROSep 5, 2020
BP-RRT: Barrier Pair Synthesis for Temporal Logic Motion PlanningBinghan He, Jaemin Lee, Ufuk Topcu et al.
For a nonlinear system (e.g. a robot) with its continuous state space trajectories constrained by a linear temporal logic specification, the synthesis of a low-level controller for mission execution often results in a non-convex optimization problem. We devise a new algorithm to solve this type of non-convex problems by formulating a rapidly-exploring random tree of barrier pairs, with each barrier pair composed of a quadratic barrier function and a full state feedback controller. The proposed method employs a rapid-exploring random tree to deal with the non-convex constraints and uses barrier pairs to fulfill the local convex constraints. As such, the method solves control problems fulfilling the required transitions of an automaton in order to satisfy given linear temporal logic constraints. At the same time it synthesizes locally optimal controllers in order to transition between the regions corresponding to the alphabet of the automaton. We demonstrate this new algorithm on a simulation of a two linkage manipulator robot.
CVJun 21, 2020
Sequential Feature Filtering ClassifierMinseok Seo, Jaemin Lee, Jongchan Park et al.
We propose Sequential Feature Filtering Classifier (FFC), a simple but effective classifier for convolutional neural networks (CNNs). With sequential LayerNorm and ReLU, FFC zeroes out low-activation units and preserves high-activation units. The sequential feature filtering process generates multiple features, which are fed into a shared classifier for multiple outputs. FFC can be applied to any CNNs with a classifier, and significantly improves performances with negligible overhead. We extensively validate the efficacy of FFC on various tasks: ImageNet-1K classification, MS COCO detection, Cityscapes segmentation, and HMDB51 action recognition. Moreover, we empirically show that FFC can further improve performances upon other techniques, including attention modules and augmentation techniques. The code and models will be publicly available.
LGFeb 14, 2020
Deep Attentive Study Session Dropout Prediction in Mobile Learning EnvironmentYoungnam Lee, Dongmin Shin, HyunBin Loh et al.
Student dropout prediction provides an opportunity to improve student engagement, which maximizes the overall effectiveness of learning experiences. However, researches on student dropout were mainly conducted on school dropout or course dropout, and study session dropout in a mobile learning environment has not been considered thoroughly. In this paper, we investigate the study session dropout prediction problem in a mobile learning environment. First, we define the concept of the study session, study session dropout and study session dropout prediction task in a mobile learning environment. Based on the definitions, we propose a novel Transformer based model for predicting study session dropout, DAS: Deep Attentive Study Session Dropout Prediction in Mobile Learning Environment. DAS has an encoder-decoder structure which is composed of stacked multi-head attention and point-wise feed-forward networks. The deep attentive computations in DAS are capable of capturing complex relations among dynamic student interactions. To the best of our knowledge, this is the first attempt to investigate study session dropout in a mobile learning environment. Empirical evaluations on a large-scale dataset show that DAS achieves the best performance with a significant improvement in area under the receiver operating characteristic curve compared to baseline models.
ROJun 10, 2019
Data-Efficient and Safe Learning for Humanoid Locomotion Aided by a Dynamic Balancing ModelJunhyeok Ahn, Jaemin Lee, Luis Sentis
In this letter, we formulate a novel Markov Decision Process (MDP) for safe and data-efficient learning for humanoid locomotion aided by a dynamic balancing model. In our previous studies of biped locomotion, we relied on a low-dimensional robot model, commonly used in high-level Walking Pattern Generators (WPGs). However, a low-level feedback controller cannot precisely track desired footstep locations due to the discrepancies between the full order model and the simplified model. In this study, we propose mitigating this problem by complementing a WPG with reinforcement learning. More specifically, we propose a structured footstep control method consisting of a WPG, a neural network, and a safety controller. The WPG provides an analytical method that promotes efficient learning while the neural network maximizes long-term rewards, and the safety controller encourages safe exploration based on step capturability and the use of control-barrier functions. Our contributions include the following (1) a structured learning control method for locomotion, (2) a data-efficient and safe learning process to improve walking using a physics-based model, and (3) the scalability of the procedure to various types of humanoid robots and walking.
ROMar 26, 2019
Efficient Trajectory Generation for Robotic Systems Constrained by Contact ForcesJaemin Lee, Efstathios Bakolas, Luis Sentis
In this work, we propose a trajectory generation method for robotic systems with contact force constraint based on optimal control and reachability analysis. Normally, the dynamics and constraints of the contact-constrained robot are nonlinear and coupled to each other. Instead of linearizing the model and constraints, we directly solve the optimal control problem to obtain the feasible state trajectory and the control input of the system. A tractable optimal control problem is formulated which is addressed by dual approaches, which are sampling-based dynamic programming and rigorous reachability analysis. The sampling-based method and Partially Observable Markov Decision Process (POMDP) are used to break down the end-to-end trajectory generation problem via sample-wise optimization in terms of given conditions. The result generates sequential pairs of subregions to be passed to reach the final goal. The reachability analysis ensures that we will find at least one trajectory starting from a given initial state and going through a sequence of subregions. The distinctive contributions of our method are to enable handling the intricate contact constraint coupled with system's dynamics due to the reduction of computational complexity of the algorithm. We validate our method using extensive numerical simulations with a legged robot.
ROSep 27, 2018
Trajectory Generation for Robotic Systems with Contact Force ConstraintsJaemin Lee, Efstathios Bakolas, Luis Sentis
This paper presents a trajectory generation method for contact-constrained robotic systems such as manipulators and legged robots. Contact-constrained systems are affected by the interaction forces between the robot and the environment. In turn, these forces determine and constrain state reachability of the robot parts or end effectors. Our study subdivides the trajectory generation problem and the supporting reachability analysis into tractable subproblems consisting of a sampling problem, a convex optimization problem, and a nonlinear programming problem. Our method leads to significant reduction of computational cost. The proposed approach is validated using a realistic simulated contact-constrained robotic system.
ROSep 24, 2018
Social Navigation Planning Based on People's Awareness of RobotsMinkyu Kim, Jaemin Lee, Steven Jens Jorgensen et al.
When mobile robots maneuver near people, they run the risk of rudely blocking their paths; but not all people behave the same around robots. People that have not noticed the robot are the most difficult to predict. This paper investigates how mobile robots can generate acceptable paths in dynamic environments by predicting human behavior. Here, human behavior may include both physical and mental behavior, we focus on the latter. We introduce a simple safe interaction model: when a human seems unaware of the robot, it should avoid going too close. In this study, people around robots are detected and tracked using sensor fusion and filtering techniques. To handle uncertainties in the dynamic environment, a Partially-Observable Markov Decision Process Model (POMDP) is used to formulate a navigation planning problem in the shared environment. People's awareness of robots is inferred and included as a state and reward model in the POMDP. The proposed planner enables a robot to change its navigation plan based on its perception of each person's robot-awareness. As far as we can tell, this is a new capability. We conduct simulation and experiments using the Toyota Human Support Robot (HSR) to validate our approach. We demonstrate that the proposed framework is capable of running in real-time.
ROSep 24, 2018
Prioritized Kinematic Control of Joint-Constrained Head-Eye Robots using the Intermediate Value ApproachSteven Jens Jorgensen, Orion Campbell, Travis Llado et al.
Existing gaze controllers for head-eye robots can only handle single fixation points. Here, a generic controller for head-eye robots capable of executing simultaneous and prioritized fixation trajectories in Cartesian space is presented. This enables the specification of multiple operational-space behaviors with priority such that the execution of a low priority head orientation task does not disturb the satisfaction of a higher prioritized eye gaze task. Through our approach, the head-eye robot inherently gains the biomimetic vestibulo-ocular reflex (VOR), which is the ability of gaze stabilization under self generated movements. The described controller utilizes recursive null space projections to encode joint limit constraints and task priorities. To handle the solution discontinuity that occurs when joint limit tasks are inserted or removed as a constraint, the Intermediate Desired Value (IDV) approach is applied. Experimental validation of the controller's properties is demonstrated with the Dreamer humanoid robot. Our contribution is on (1) the formulation of a desired gaze task as an operational space orientation task, (2) the application details of the IDV approach for the prioritized head-eye robot controller that can handle intermediate joint constraints, and (3) a minimum-jerk specification for behavior and trajectory generation in Cartesian space.
ROJul 3, 2018
Computationally-Robust and Efficient Prioritized Whole-Body Controller with Contact ConstraintsDonghyun Kim, Jaemin Lee, Orion Campbell et al.
In this paper, we devise methods for the multi- objective control of humanoid robots, a.k.a. prioritized whole- body controllers, that achieve efficiency and robustness in the algorithmic computations. We use a form of whole-body controllers that is very general via incorporating centroidal momentum dynamics, operational task priorities, contact re- action forces, and internal force constraints. First, we achieve efficiency by solving a quadratic program that only involves the floating base dynamics and the reaction forces. Second, we achieve computational robustness by relaxing task accelerations such that they comply with friction cone constraints. Finally, we incorporate methods for smooth contact transitions to enhance the control of dynamic locomotion behaviors. The proposed methods are demonstrated both in simulation and in real experiments using a passive-ankle bipedal robot.
ROAug 7, 2017
Robust Dynamic Locomotion via Reinforcement Learning and Novel Whole Body ControllerDonghyun Kim, Jaemin Lee, Luis Sentis
We propose a robust dynamic walking controller consisting of a dynamic locomotion planner, a reinforcement learning process for robustness, and a novel whole-body locomotion controller (WBLC). Previous approaches specify either the position or the timing of steps, however, the proposed locomotion planner simultaneously computes both of these parameters as locomotion outputs. Our locomotion strategy relies on devising a reinforcement learning (RL) approach for robust walking. The learned policy generates multi step walking patterns, and the process is quick enough to be suitable for real-time controls. For learning, we devise an RL strategy that uses a phase space planner (PSP) and a linear inverted pendulum model to make the problem tractable and very fast. Then, the learned policy is used to provide goal-based commands to the WBLC, which calculates the torque commands to be executed in full-humanoid robots. The WBLC combines multiple prioritized tasks and calculates the associated reaction forces based on practical inequality constraints. The novel formulation includes efficient calculation of the time derivatives of various Jacobians. This provides high-fidelity dynamic control of fast motions. More specifically, we compute the time derivative of the Jacobian for various tasks and the Jacobian of the centroidal momentum task by utilizing Lie group operators and operational space dynamics respectively. The integration of RL-PSP and the WBLC provides highly robust, versatile, and practical locomotion including steering while walking and handling push disturbances of up to 520 N during an interval of 0.1 sec. Theoretical and numerical results are tested through a 3D physics-based simulation of the humanoid robot Valkyrie.