Il hong Suh

CV
h-index5
7papers
969citations
Novelty53%
AI Score43

7 Papers

LGMar 26
GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs

Selim An, Il hong Suh, Yeseong Kim

Quantization techniques such as BitsAndBytes, AWQ, and GPTQ are widely used as a standard method in deploying large language models but often degrades accuracy when using low-bit representations, e.g., 4 bits. Low-rank correction methods (e.g., LQER, QERA, ASER) has been proposed to mitigate this issue, however, they restore all layers and insert error-correction modules into every decoder block, which increases latency and memory overhead. To address this limitation, we propose GlowQ, a group-shared low-rank approximation for quantized LLMs that caches a single shared right factor per input-sharing group and restores only the groups or layers that yield the highest accuracy benefit. GlowQ computes the high-precision projection once per input-sharing group and reuses it across its modules, reducing parameter and memory overhead, and retaining the expressivity of layer-specific corrections. We also propose a selective variant, GlowQ-S, that applies the cached shared module only where it provides the largest benefit. Compared with strong baselines, our approach reduces TTFB by (5.6%) and increases throughput by (9.6%) on average, while reducing perplexity on WikiText-2 by (0.17%) and increasing downstream accuracy by 0.42 percentage points. The selective model GlowQ-S further reduces latency, cutting TTFB by (23.4%) and increasing throughput by (37.4%), while maintaining accuracy within 0.2 percentage points on average.

CLMay 1, 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation

Chaitali Bhattacharyya, Hyunsei Lee, Junyoung Lee et al.

Training large language models (LLMs) from scratch requires significant computational resources, driving interest in developing smaller, domain-specific LLMs that maintain both efficiency and strong task performance. Medium-sized models such as LLaMA, llama} have served as starting points for domain-specific adaptation, but they often suffer from accuracy degradation when tested on specialized datasets. We introduce FineScope, a framework for deriving compact, domain-optimized LLMs from larger pretrained models. FineScope leverages the Sparse Autoencoder (SAE) framework, inspired by its ability to produce interpretable feature representations, to extract domain-specific subsets from large datasets. We apply structured pruning with domain-specific constraints, ensuring that the resulting pruned models retain essential knowledge for the target domain. To further enhance performance, these pruned models undergo self-data distillation, leveraging SAE-curated datasets to restore key domain-specific information lost during pruning. Extensive experiments and ablation studies demonstrate that FineScope achieves highly competitive performance, outperforming several large-scale state-of-the-art LLMs in domain-specific tasks. Additionally, our results show that FineScope enables pruned models to regain a substantial portion of their original performance when fine-tuned with SAE-curated datasets. Furthermore, applying these datasets to fine-tune pretrained LLMs without pruning also improves their domain-specific accuracy, highlighting the robustness of our approach.

ROMar 12, 2021
Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning

Reinis Cimurs, Il Hong Suh, Jin Han Lee

In this paper, we present an autonomous navigation system for goal-driven exploration of unknown environments through deep reinforcement learning (DRL). Points of interest (POI) for possible navigation directions are obtained from the environment and an optimal waypoint is selected, based on the available data. Following the waypoints, the robot is guided towards the global goal and the local optimum problem of reactive navigation is mitigated. Then, a motion policy for local navigation is learned through a DRL framework in a simulation. We develop a navigation system where this learned policy is integrated into a motion planning stack as the local navigation layer to move the robot between waypoints towards a global goal. The fully autonomous navigation is performed without any prior knowledge while a map is recorded as the robot moves through the environment. Experiments show that the proposed method has an advantage over similar exploration methods, without reliance on a map or prior information in complex static as well as dynamic environments.

LGFeb 27, 2020
Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image

Taewon Kim, Yeseong Park, Youngbin Park et al.

For a robotic grasping task in which diverse unseen target objects exist in a cluttered environment, some deep learning-based methods have achieved state-of-the-art results using visual input directly. In contrast, actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects, especially when learning from raw images and sparse rewards. To make these RL techniques feasible for vision-based grasping tasks, we employ state representation learning (SRL), where we encode essential information first for subsequent use in RL. However, typical representation learning procedures are unsuitable for extracting pertinent information for learning the grasping skill, because the visual inputs for representation learning, where a robot attempts to grasp a target object in clutter, are extremely complex. We found that preprocessing based on the disentanglement of a raw input image is the key to effectively capturing a compact representation. This enables deep RL to learn robotic grasping skills from highly varied and diverse visual inputs. We demonstrate the effectiveness of this approach with varying levels of disentanglement in a realistic simulated environment.

CVJul 24, 2019
From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation

Jin Han Lee, Myung-Kyu Han, Dong Wook Ko et al.

Estimating accurate depth from a single image is challenging because it is an ill-posed problem as infinitely many 3D scenes can be projected to the same 2D scene. However, recent works based on deep convolutional neural networks show great progress with plausible results. The convolutional neural networks are generally composed of two parts: an encoder for dense feature extraction and a decoder for predicting the desired depth. In the encoder-decoder schemes, repeated strided convolution and spatial pooling layers lower the spatial resolution of transitional outputs, and several techniques such as skip connections or multi-layer deconvolutional networks are adopted to recover the original resolution for effective dense prediction. In this paper, for more effective guidance of densely encoded features to the desired depth prediction, we propose a network architecture that utilizes novel local planar guidance layers located at multiple stages in the decoding phase. We show that the proposed method outperforms the state-of-the-art works with significant margin evaluating on challenging benchmarks. We also provide results from an ablation study to validate the effectiveness of the proposed method.

CVJan 20, 2017
Efficient Feature Matching by Progressive Candidate Search

Sehyung Lee, Jongwoo Lim, Il Hong Suh

We present a novel feature matching algorithm that systematically utilizes the geometric properties of features such as position, scale, and orientation, in addition to the conventional descriptor vectors. In challenging scenes with the presence of repetitive patterns or with a large viewpoint change, it is hard to find the correct correspondences using feature descriptors only, since the descriptor distances of the correct matches may not be the least among the candidates due to appearance changes. Assuming that the layout of the nearby features does not changed much, we propose the bidirectional transfer measure to gauge the geometric consistency of a pair of feature correspondences. The feature matching problem is formulated as a Markov random field (MRF) which uses descriptor distances and relative geometric similarities together. The unmatched features are explicitly modeled in the MRF to minimize its negative impact. For speed and stability, instead of solving the MRF on the entire features at once, we start with a small set of confident feature matches, and then progressively search the candidates in nearby features and expand the MRF with them. Experimental comparisons show that the proposed algorithm finds better feature correspondences, i.e. more matches with higher inlier ratio, in many challenging scenes with much lower computational cost than the state-of-the-art algorithms.

CVApr 15, 2016
Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks

Youngbin Park, Sungphill Moon, Il Hong Suh

Kinect skeleton tracker is able to achieve considerable human body tracking performance in convenient and a low-cost manner. However, The tracker often captures unnatural human poses such as discontinuous and vibrated motions when self-occlusions occur. A majority of approaches tackle this problem by using multiple Kinect sensors in a workspace. Combination of the measurements from different sensors is then conducted in Kalman filter framework or optimization problem is formulated for sensor fusion. However, these methods usually require heuristics to measure reliability of measurements observed from each Kinect sensor. In this paper, we developed a method to improve Kinect skeleton using single Kinect sensor, in which supervised learning technique was employed to correct unnatural tracking motions. Specifically, deep recurrent neural networks were used for improving joint positions and velocities of Kinect skeleton, and three methods were proposed to integrate the refined positions and velocities for further enhancement. Moreover, we suggested a novel measure to evaluate naturalness of captured motions. We evaluated the proposed approach by comparison with the ground truth obtained using a commercial optical maker-based motion capture system.