Dohyeon Kim

CV
h-index19
8papers
33citations
Novelty43%
AI Score43

8 Papers

CVDec 9, 2025
EgoX: Egocentric Video Generation from a Single Exocentric Video

Taewoong Kang, Kinam Kim, Dohyeon Kim et al.

Egocentric perception enables humans to experience and understand the world directly from their own point of view. Translating exocentric (third-person) videos into egocentric (first-person) videos opens up new possibilities for immersive understanding but remains highly challenging due to extreme camera pose variations and minimal view overlap. This task requires faithfully preserving visible content while synthesizing unseen regions in a geometrically consistent manner. To achieve this, we present EgoX, a novel framework for generating egocentric videos from a single exocentric input. EgoX leverages the pretrained spatio temporal knowledge of large-scale video diffusion models through lightweight LoRA adaptation and introduces a unified conditioning strategy that combines exocentric and egocentric priors via width and channel wise concatenation. Additionally, a geometry-guided self-attention mechanism selectively attends to spatially relevant regions, ensuring geometric coherence and high visual fidelity. Our approach achieves coherent and realistic egocentric video generation while demonstrating strong scalability and robustness across unseen and in-the-wild videos.

86.0ROMay 5
RLDX-1 Technical Report

Dongyoung Kim, Huiwon Jang, Myungkyu Koo et al.

While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligence (i.e. broad scene understanding and language-conditioned generalization) inherited from pre-trained Vision-Language Models, they still struggle with complex real-world tasks requiring broader functional capabilities (e.g. motion awareness, memory-aware decision making, and physical sensing). To address this, we introduce RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention. RLDX-1 further combines this architecture with system-level design choices, including synthesizing training data for rare manipulation scenarios, learning procedures specialized for human-like manipulation, and inference optimizations for real-time deployment. Through empirical evaluation, we show that RLDX-1 consistently outperforms recent frontier VLAs (e.g. $π_{0.5}$ and GR00T N1.6) across both simulation benchmarks and real-world tasks that require broad functional capabilities beyond general versatility. In particular, RLDX-1 shows superiority in ALLEX humanoid tasks by achieving success rates of 86.8% while $π_{0.5}$ and GR00T N1.6 achieve around 40%, highlighting the ability of RLDX-1 to control a high-DoF humanoid robot under diverse functional demands. Together, these results position RLDX-1 as a promising step toward reliable VLAs for complex, contact-rich, and dynamic real-world dexterous manipulation.

OCJan 21, 2025
MirrorCBO: A consensus-based optimization method in the spirit of mirror descent

Leon Bungert, Franca Hoffmann, Dohyeon Kim et al.

In this work we propose MirrorCBO, a consensus-based optimization (CBO) method which generalizes standard CBO in the same way that mirror descent generalizes gradient descent. For this we apply the CBO methodology to a swarm of dual particles and retain the primal particle positions by applying the inverse of the mirror map, which we parametrize as the subdifferential of a strongly convex function $φ$. In this way, we combine the advantages of a derivative-free non-convex optimization algorithm with those of mirror descent. As a special case, the method extends CBO to optimization problems with convex constraints. Assuming bounds on the Bregman distance associated to $φ$, we provide asymptotic convergence results for MirrorCBO with explicit exponential rate. Another key contribution is an exploratory numerical study of this new algorithm across different application settings, focusing on (i) sparsity-inducing optimization, and (ii) constrained optimization, demonstrating the competitive performance of MirrorCBO. We observe empirically that the method can also be used for optimization on (non-convex) submanifolds of Euclidean space, can be adapted to mirrored versions of other recent CBO variants, and that it inherits from mirror descent the capability to select desirable minimizers, like sparse ones. We also include an overview of recent CBO approaches for constrained optimization and compare their performance to MirrorCBO.

AINov 22, 2025
How Far Can LLMs Emulate Human Behavior?: A Strategic Analysis via the Buy-and-Sell Negotiation Game

Mingyu Jeon, Jaeyoung Suh, Suwan Cho et al.

With the rapid advancement of Large Language Models (LLMs), recent studies have drawn attention to their potential for handling not only simple question-answer tasks but also more complex conversational abilities and performing human-like behavioral imitations. In particular, there is considerable interest in how accurately LLMs can reproduce real human emotions and behaviors, as well as whether such reproductions can function effectively in real-world scenarios. However, existing benchmarks focus primarily on knowledge-based assessment and thus fall short of sufficiently reflecting social interactions and strategic dialogue capabilities. To address these limitations, this work proposes a methodology to quantitatively evaluate the human emotional and behavioral imitation and strategic decision-making capabilities of LLMs by employing a Buy and Sell negotiation simulation. Specifically, we assign different personas to multiple LLMs and conduct negotiations between a Buyer and a Seller, comprehensively analyzing outcomes such as win rates, transaction prices, and SHAP values. Our experimental results show that models with higher existing benchmark scores tend to achieve better negotiation performance overall, although some models exhibit diminished performance in scenarios emphasizing emotional or social contexts. Moreover, competitive and cunning traits prove more advantageous for negotiation outcomes than altruistic and cooperative traits, suggesting that the assigned persona can lead to significant variations in negotiation strategies and results. Consequently, this study introduces a new evaluation approach for LLMs' social behavior imitation and dialogue strategies, and demonstrates how negotiation simulations can serve as a meaningful complementary metric to measure real-world interaction capabilities-an aspect often overlooked in existing benchmarks.

CVMar 31, 2024
Statistical Analysis by Semiparametric Additive Regression and LSTM-FCN Based Hierarchical Classification for Computer Vision Quantification of Parkinsonian Bradykinesia

Youngseo Cho, In Hee Kwak, Dohyeon Kim et al.

Bradykinesia, characterized by involuntary slowing or decrement of movement, is a fundamental symptom of Parkinson's Disease (PD) and is vital for its clinical diagnosis. Despite various methodologies explored to quantify bradykinesia, computer vision-based approaches have shown promising results. However, these methods often fall short in adequately addressing key bradykinesia characteristics in repetitive limb movements: "occasional arrest" and "decrement in amplitude." This research advances vision-based quantification of bradykinesia by introducing nuanced numerical analysis to capture decrement in amplitudes and employing a simple deep learning technique, LSTM-FCN, for precise classification of occasional arrests. Our approach structures the classification process hierarchically, tailoring it to the unique dynamics of bradykinesia in PD. Statistical analysis of the extracted features, including those representing arrest and fatigue, has demonstrated their statistical significance in most cases. This finding underscores the importance of considering "occasional arrest" and "decrement in amplitude" in bradykinesia quantification of limb movement. Our enhanced diagnostic tool has been rigorously tested on an extensive dataset comprising 1396 motion videos from 310 PD patients, achieving an accuracy of 80.3%. The results confirm the robustness and reliability of our method.

DCSep 2, 2021
A Reliable, Self-Adaptive Face Identification Framework via Lyapunov Optimization

Dohyeon Kim, Joongheon Kim, Jae young Bang

Realtime face identification (FID) from a video feed is highly computation-intensive, and may exhaust computation resources if performed on a device with a limited amount of resources (e.g., a mobile device). In general, FID performs better when images are sampled at a higher rate, minimizing false negatives. However, performing it at an overwhelmingly high rate exposes the system to the risk of a queue overflow that hampers the system's reliability. This paper proposes a novel, queue-aware FID framework that adapts the sampling rate to maximize the FID performance while avoiding a queue overflow by implementing the Lyapunov optimization. A preliminary evaluation via a trace-based simulation confirms the effectiveness of the framework.

HCSep 18, 2018
Modelling the Intrusive feelings of advanced driver assistance systems based on vehicle activity log data: a case study for the lane keeping assistance system

Kyudong Park, Jiyoung Kwahk, Sung H. Han et al.

Although the automotive industry has been among the sectors that best-understands the importance of drivers' affect, the focus of design and research in the automotive field has long emphasized the visceral aspects of exterior and interior design. With the adoption of Advanced Driver Assistance Systems (ADAS), endowing 'semi-autonomy' to the vehicles, however, the scope of affective design should be expanded to include the behavioural aspects of the vehicle. In such a 'shared-control' system wherein the vehicle can intervene in the human driver's operations, a certain degree of 'intrusive feelings' are unavoidable. For example, when the Lane Keeping Assistance System (LKAS), one of the most popular examples of ADAS, operates the steering wheel in a dangerous situation, the driver may feel interrupted or surprised because of the abrupt torque generated by LKAS. This kind of unpleasant experience can lead to prolonged negative feelings such as irritation, anxiety, and distrust of the system. Therefore, there are increasing needs of investigating the driver's affective responses towards the vehicle's dynamic behaviour. In this study, four types of intrusive feelings caused by LKAS were identified to be proposed as a quantitative performance indicator in designing the affectively satisfactory behaviour of LKAS. A metric as well as a statistical data analysis method to quantitatively measure the intrusive feelings through the vehicle sensor log data.

HCSep 16, 2018
Usability of the Size, Spacing, and Depth of Virtual Buttons on Head-Mounted Displays

Kyudong Park, Dohyeon Kim, Sung H. Han

Virtual reality (VR) allows users to see and manipulate virtual scenes and items through input devices, like head-mounted displays. In this study, the effects of button size, spacing, and depth on the usability of virtual buttons in VR environments were investigated. Task completion time, number of errors, and subjective preferences were collected to test different levels of the button size, spacing, and depth. The experiment was conducted in a desktop setting with Oculus Rift and Leap motion. A total of 18 subjects performed a button selection task. The optimal levels of button size and spacing within the experimental conditions are 25 mm and between 5 mm and 9 mm, respectively. Button sizes of 15 mm with 1-mm spacing were too small to be used in VR environments. A trend of decreasing task completion time and the number of errors was observed as button size and spacing increased. However, large size and spacing may cause fatigue, due to continuous extension of the arms. For depth effects, the touch method took a shorter task completion time. However, the push method recorded a smaller number of errors, owing to the visual push-feedback. In this paper, we discuss advantages and disadvantages in detail. The results can be applied to many different application areas with VR HMD.