Jun-Ho Oh

h-index34

6papers

29citations

Novelty45%

AI Score41

Ranked #64,056 of 194,257 authors (top 33%)#1,875 in RO (top 28%)

6 Papers

11.9ARMar 12

SliceMoE: Bit-Sliced Expert Caching under Miss-Rate Constraints for Efficient MoE Inference

Yuseon Choi, Sangjin Kim, Jungjun Oh et al.

MoE models offer efficient scaling through conditional computation, but their large parameter size and expensive expert offloading make on-device deployment challenging. Existing acceleration techniques such as prefetching or expert clustering often increase energy usage or reduce expert diversity. We present SliceMoE, an energy-efficient MoE inference framework for miss-rate-constrained deployment. SliceMoE introduces Dynamic Bit-Sliced Caching (DBSC), which caches experts at slice-level granularity and assigns precision on demand to expand effective expert capacity. To support mixed-precision experts without memory duplication, we propose Calibration-Free Asymmetric Matryoshka Quantization (AMAT), a truncation-based scheme that maintains compatibility between low-bit and high-bit slices. We further introduce Predictive Cache Warmup (PCW) to reduce early-decode cold misses by reshaping cache contents during prefill. Evaluated on DeepSeek-V2-Lite and Qwen1.5-MoE-A2.7B, SliceMoE reduces decode-stage energy consumption by up to 2.37x and 2.85x, respectively, and improves decode latency by up to 1.81x and 1.64x, while preserving near-high-bit accuracy. These results demonstrate that slice-level caching enables an efficient on-device MoE deployment.

2.2ROJan 2, 2020Code

Motion Generation Interface of ROS to PODO Software Framework for Wheeled Huamanoid Robot

Moonyoung Lee, Yujin Heo, Saihim Cho et al.

This paper discusses the development of robot motion generation interface between a real-time software architecture and a non-real-time robot operating system. In order for robots to execute intelligent manipulation or navigation, close integration of high-level perception and low-level control is required. However, many available open-source perception modules are developed in ROS, which operates on Linux OS that don't guarantee RT performance. This can lead to non-deterministic responses and stability problems that can adversely affect robot control. As a result, many robotic systems devote RTOS for low-level motion control. Similarly, the humanoid robot platform developed at KAIST, Hubo, utilizes a custom real-time software framework called PODO. Although PODO provides easy interface for motion generation, it lacks interface to high-level frameworks such as ROS. As such, we present a new motion generation interface between ROS and PODO that enables users to generate motion trajectories through standard ROS messages while leveraging a real-time motion controller. With the proposed communication interface, we demonstrate series of manipulator tasks on the actual wheeled humanoid platform, M-Hubo. The overall communication interface responsiveness was at most 27 milliseconds.

7.1LGDec 15, 2025

SeVeDo: A Heterogeneous Transformer Accelerator for Low-Bit Inference via Hierarchical Group Quantization and SVD-Guided Mixed Precision

Yuseon Choi, Sangjin Kim, Jungjun Oh et al.

Low-bit quantization is a promising technique for efficient transformer inference by reducing computational and memory overhead. However, aggressive bitwidth reduction remains challenging due to activation outliers, leading to accuracy degradation. Existing methods, such as outlier-handling and group quantization, achieve high accuracy but incur substantial energy consumption. To address this, we propose SeVeDo, an energy-efficient SVD-based heterogeneous accelerator that structurally separates outlier-sensitive components into a high-precision low-rank path, while the remaining computations are executed in a low-bit residual datapath with group quantization. To further enhance efficiency, Hierarchical Group Quantization (HGQ) combines coarse-grained floating-point scaling with fine-grained shifting, effectively reducing dequantization cost. Also, SVD-guided mixed precision (SVD-MP) statically allocates higher bitwidths to precision-sensitive components identified through low-rank decomposition, thereby minimizing floating-point operation cost. Experimental results show that SeVeDo achieves a peak energy efficiency of 13.8TOPS/W, surpassing conventional designs, with 12.7TOPS/W on ViT-Base and 13.4TOPS/W on Llama2-7B benchmarks.

2.2RONov 30, 2020

Dynamic Humanoid Locomotion over Uneven Terrain With Streamlined Perception-Control Pipeline

Moonyoung Lee, Youngsun Kwon, Sebin Lee et al.

Although bipedal locomotion provides the ability to traverse unstructured environments, it requires careful planning and control to safely walk across without falling. This poses an integrated challenge for the robot to perceive, plan, and control its movements, especially with dynamic motions where the robot may have to adapt its swing-leg trajectory onthe-fly in order to safely place its foot on the uneven terrain. In this paper we present an efficient geometric footstep planner and the corresponding walking controller that enables a humanoid robot to dynamically walk across uneven terrain at speeds up to 0.3 m/s. As dynamic locomotion, we refer first to the continuous walking motion without stopping, and second to the on-the-fly replanning of the landing footstep position in middle of the swing phase during the robot gait cycle. This is mainly achieved through the streamlined integration between an efficient sampling-based planner and robust walking controller. The footstep planner is able to generate feasible footsteps within 5 milliseconds, and the controller is able to generate a new corresponding swing leg trajectory as well as the wholebody motion to dynamically balance the robot to the newly updated footsteps. The proposed perception-control pipeline is evaluated and demonstrated with real experiments using a fullscale humanoid to traverse across uneven terrains featured by static stepping stones, dynamically movable stepping stone, or narrow path.

11.3ROJan 2, 2020

Fast Perception, Planning, and Execution for a Robotic Butler: Wheeled Humanoid M-Hubo

Moonyoung Lee, Yujin Heo, Jinyong Park et al.

As the aging population grows at a rapid rate, there is an ever growing need for service robot platforms that can provide daily assistance at practical speed with reliable performance. In order to assist with daily tasks such as fetching a beverage, a service robot must be able to perceive its environment and generate corresponding motion trajectories. This becomes a challenging and computationally complex problem when the environment is unknown and thus the path planner must sample numerous trajectories that often are sub-optimal, extending the execution time. To address this issue, we propose a unique strategy of integrating a 3D object detection pipeline with a kinematically optimal manipulation planner to significantly increase speed performance at runtime. In addition, we develop a new robotic butler system for a wheeled humanoid that is capable of fetching requested objects at 24% of the speed a human needs to fulfill the same task. The proposed system was evaluated and demonstrated in a real-world environment setup as well as in public exhibition.

3.6CVSep 21, 2015

Vision System and Depth Processing for DRC-HUBO+

Inwook Shim, Seunghak Shin, Yunsu Bok et al.

This paper presents a vision system and a depth processing algorithm for DRC-HUBO+, the winner of the DRC finals 2015. Our system is designed to reliably capture 3D information of a scene and objects robust to challenging environment conditions. We also propose a depth-map upsampling method that produces an outliers-free depth map by explicitly handling depth outliers. Our system is suitable for an interactive robot with real-world that requires accurate object detection and pose estimation. We evaluate our depth processing algorithm over state-of-the-art algorithms on several synthetic and real-world datasets.