Donghyuk Kim

RO
3papers
12citations
Novelty63%
AI Score38

3 Papers

IVDec 13, 2025
V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval

Donghyuk Kim, Sejeong Yang, Wonjin Shin et al.

Streaming video large language models (LLMs) are increasingly used for real-time multimodal tasks such as video captioning, question answering, conversational agents, and augmented reality. However, these models face fundamental memory and computational challenges because their key-value (KV) caches grow substantially with continuous streaming video input. This process requires an iterative prefill stage, which is a unique feature of streaming video LLMs. Due to its iterative prefill stage, it suffers from significant limitations, including extensive computation, substantial data transfer, and degradation in accuracy. Crucially, this issue is exacerbated for edge deployment, which is the primary target for these models. In this work, we propose V-Rex, the first software-hardware co-designed accelerator that comprehensively addresses both algorithmic and hardware bottlenecks in streaming video LLM inference. At its core, V-Rex introduces ReSV, a training-free dynamic KV cache retrieval algorithm. ReSV exploits temporal and spatial similarity-based token clustering to reduce excessive KV cache memory across video frames. To fully realize these algorithmic benefits, V-Rex offers a compact, low-latency hardware accelerator with a dynamic KV cache retrieval engine (DRE), featuring bit-level and early-exit based computing units. V-Rex achieves unprecedented real-time of 3.9-8.3 FPS and energy-efficient streaming video LLM inference on edge deployment with negligible accuracy loss. While DRE only accounts for 2.2% power and 2.0% area, the system delivers 1.9-19.7x speedup and 3.1-18.5x energy efficiency improvements over AGX Orin GPU. This work is the first to comprehensively tackle KV cache retrieval across algorithms and hardware, enabling real-time streaming video LLM inference on resource-constrained edge devices.

ROSep 27, 2019
TORM: Fast and Accurate Trajectory Optimization of Redundant Manipulator given an End-Effector Path

Mincheul Kang, Heechan Shin, Donghyuk Kim et al.

A redundant manipulator has multiple inverse kinematics solutions per end-effector pose. Accordingly, there can be many trajectories for joints that follow a given endeffector path in the Cartesian space. In this paper, we present a trajectory optimization of a redundant manipulator (TORM) to synthesize a trajectory that follows a given end-effector path accurately, while achieving smoothness and collisionfree manipulation. Our method holistically incorporates three desired properties into the trajectory optimization process by integrating the Jacobian-based inverse kinematics solving method and an optimization-based motion planning approach. Specifically, we optimize a trajectory using two-stage gradient descent to reduce potential competition between different properties during the update. To avoid falling into local minima, we iteratively explore different candidate trajectories with our local update. We compare our method with state-of-the-art methods in test scenes including external obstacles and two non-obstacle problems. Our method robustly minimizes the pose error in a progressive manner while satisfying various desirable properties.

ROSep 20, 2018
Harmonious Sampling for Mobile Manipulation Planning

Mincheul Kang, Donghyuk Kim, Sung-Eui Yoon

Mobile manipulation planning commonly adopts a decoupled approach that performs planning separately on the base and the manipulator. While this approach is fast, it can generate sub-optimal paths. Another direction is a coupled approach jointly adjusting the base and manipulator in a high-dimensional configuration space. This coupled approach addresses sub-optimality and incompleteness of the decoupled approach, but has not been widely used due to its excessive computational overhead. Given this trade-off space, we present a simple, yet effective mobile manipulation sampling method, harmonious sampling, to perform the coupled approach mainly in difficult regions, where we need to simultaneously maneuver the base and the manipulator. Our method identifies such difficult regions through a low-dimensional base space by utilizing a reachability map given the target end-effector pose and narrow passage detected by generalized Voronoi diagram. For the rest of simple regions, we perform sampling mainly on the base configurations with a predefined joint configuration, accelerating the planning process. We compare our method with the decoupled and coupled approaches in six different problems with varying difficulty. Our method shows meaningful improvements experimentally in terms of time to find an initial solution (up to 5.6 times faster) and final solution cost (up to 17% lower) over the decoupled approach, especially in difficult scenes with narrow space. We also demonstrate these benefits with a real, mobile Hubo robot.