Youngjae Kim

CR
h-index3
8papers
8citations
Novelty48%
AI Score37

8 Papers

FLU-DYNAug 4, 2023
On stable wrapper-based parameter selection method for efficient ANN-based data-driven modeling of turbulent flows

Hyeongeun Yun, Yongcheol Choi, Youngjae Kim et al.

To model complex turbulent flow and heat transfer phenomena, this study aims to analyze and develop a reduced modeling approach based on artificial neural network (ANN) and wrapper methods. This approach has an advantage over other methods such as the correlation-based filter method in terms of removing redundant or irrelevant parameters even under non-linearity among them. As a downside, the overfitting and randomness of ANN training may produce inconsistent subsets over selection trials especially in a higher physical dimension. This study analyzes a few existing ANN-based wrapper methods and develops a revised one based on the gradient-based subset selection indices to minimize the loss in the total derivative or the directional consistency at each elimination step. To examine parameter reduction performance and consistency-over-trials, we apply these methods to a manufactured subset selection problem, modeling of the bubble size in a turbulent bubbly flow, and modeling of the spatially varying turbulent Prandtl number in a duct flow. It is found that the gradient-based subset selection to minimize the total derivative loss results in improved consistency-over-trials compared to the other ANN-based wrapper methods, while removing unnecessary parameters successfully. For the reduced turbulent Prandtl number model, the gradient-based subset selection improves the prediction in the validation case over the other methods. Also, the reduced parameter subsets show a slight increase in the training speed compared to the others.

80.6DCApr 29
DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference

Bodon Jeong, Hongsu Byun, Youngjae Kim et al.

The increasing deployment of Large Language Model (LLM) inference on edge AI systems demands efficient execution under tight memory budgets. A key challenge arises from Key-Value (KV) caches, which often exceed available device memory. Although NVMe-based offloading offers scalable capacity, existing file-based designs rely heavily on the kernel page cache, leading to cache thrashing, unpredictable latency, and high software overhead under memory pressure. We present DUAL-BLADE, a dual-path KV residency framework that dynamically assigns KV tensors to either a page-cache path or an NVMe-direct path based on runtime memory availability. The NVMe-direct path bypasses the filesystem by mapping KV tensors to contiguous logical block address (LBA) regions, enabling low-overhead direct storage access. DUAL-BLADE further incorporates adaptive pipeline parallelism to overlap storage I/O with GPU DMA, improving inference throughput. Our evaluation shows that DUAL-BLADE substantially mitigates I/O bottlenecks, reducing prefill and decode latency by up to 33.1% and 42.4%, respectively, while improving SSD utilization by 2.2x across diverse memory budgets.

AIApr 16, 2025
Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs

Hyungwoo Lee, Kihyun Kim, Jinwoo Kim et al.

Recent large language models (LLMs) face increasing inference latency as input context length and model size continue to grow. In particular, the retrieval-augmented generation (RAG) technique, which enhances LLM responses by incorporating external knowledge, exacerbates this issue by significantly increasing the number of input tokens. This expansion in token length leads to a substantial rise in computational overhead, particularly during the prefill stage, resulting in prolonged time-to-first-token (TTFT). To address this issue, this paper proposes a method to reduce TTFT by leveraging a disk-based key-value (KV) cache to lessen the computational burden during the prefill stage. We also introduce a disk-based shared KV cache management system, called Shared RAG-DCache, for multi-instance LLM RAG service environments. This system, together with an optimal system configuration, improves both throughput and latency under given resource constraints. Shared RAG-DCache exploits the locality of documents related to user queries in RAG, as well as the queueing delay in LLM inference services. It proactively generates and stores disk KV caches for query-related documents and shares them across multiple LLM instances to enhance inference performance. In experiments on a single host equipped with 2 GPUs and 1 CPU, Shared RAG-DCache achieved a 15~71% increase in throughput and up to a 12~65% reduction in latency, depending on the resource configuration.

LGApr 16, 2025
Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading

Kihyun Kim, Jinwoo Kim, Hyunsun Chung et al.

LLM inference is essential for applications like text summarization, translation, and data analysis, but the high cost of GPU instances from Cloud Service Providers (CSPs) like AWS is a major burden. This paper proposes InferSave, a cost-efficient VM selection framework for cloud based LLM inference. InferSave optimizes KV cache offloading based on Service Level Objectives (SLOs) and workload charac teristics, estimating GPU memory needs, and recommending cost-effective VM instances. Additionally, the Compute Time Calibration Function (CTCF) improves instance selection accuracy by adjusting for discrepancies between theoretical and actual GPU performance. Experiments on AWS GPU instances show that selecting lower-cost instances without KV cache offloading improves cost efficiency by up to 73.7% for online workloads, while KV cache offloading saves up to 20.19% for offline workloads.

CRAug 14, 2021
A Policy-based Versioning SSD with Intel SGX

Jinwoo Ahn, Seungjin Lee, Jinhoon Lee et al.

Privileged malware neutralizes software-based versioning systems and destroys data. To counter this threat, a versioning solid-state drive (SSD) that performs versioning inside the SSD has been studied. An SSD is a suitable candidate for data versioning because it can preserve previous versions without additional copying, and provide high security with a very small trusted computing base (TCB). However, the versioning SSDs studied so far commonly use a full disk versioning method that preserves all file versions in a batch. This paper demonstrates that SSDs, which provide full disk versioning, can be exposed to data tampering attacks when the retention time of data is less than the malware's dwell time. To deal with this threat, we propose SGX-SSD, a policy-based per-file versioning SSD to keep a deeper history for only the important files of users. However, since the SSD isn't aware of a file semantic, and the versioning policy information should be securely received from the untrusted host computer, implementing the per-file versioning in SSD is a huge challenge. To solve this problem, SGX-SSD utilizes the Intel SGX and has a secure host interface to securely receive policy information (configuration values) from the user. Also, to solve the file semantic unawareness problem of the SSD, a piggyback module is designed to give a file hint at the host layer, and an algorithm for selective versioning based on the policy is implemented in the SSD. To prove our system, we prototyped SGX-SSD the Jasmine OpenSSD platform in Linux environment. In the experimental evaluation, we proved that SGX-SSD provides strong security with little additional overhead for selective per-file versioning.

SEOct 12, 2020
A Generic Framework For Capturing Reliability in Cyber Physical Systems

Nazakat Ali, Manzoor Hussain, Youngjae Kim et al.

Cyber Physical Systems solve complex problems through their tight integration between the physical and computational components. Therefore, the reliability of a complex system is the most critical requirement for the cyber physical system because an unreliable system often leads to service disruption, property dam-age, financial loses and sometimes lead to fatality. In order to develop more reliable CPS, this paper proposes a generic framework for reliability modeling and analysis for our ongoing work on cyber physical systems.This paper, at first defines an architecture for general CPS which is comprised of three layers; environment layer, communication layer, and computational layer. Secondly, we formalize a reliability model for the architectural components, and then propose a framework for the reliability of CPS with the consideration of how to capture the reliability. Based on the research method, we demonstrate the proposed frame-work with an illustrative example by using different reliability values from offshore and onshore reliability data library. We confirmed that the reliability model covers almost all possible reliabilities required to general cyber-physical systems.

CRApr 28, 2020
SGX-SSD: A Policy-based Versioning SSD with Intel SGX

Jinwoo Ahn, Seungjin Lee, Jinhoon Lee et al.

This paper demonstrates that SSDs, which perform device-level versioning, can be exposed to data tampering attacks when the retention time of data is less than the malware's dwell time. To deal with that threat, we propose SGX-SSD, a SGX-based versioning SSD which selectively preserves file history based on the given policy. The proposed system adopts Intel SGX to implement the version policy management system that is safe from high-privileged malware. Based on the policy, only the necessary data is selectively preserved in SSD that prevents files with less priority from wasting space and also ensures the integrity of important files.

CRApr 10, 2019
KEY-SSD: Access-Control Drive to Protect Files from Ransomware Attacks

Jinwoo Ahn, Donggyu Park, Chang-Gyu Lee et al.

Traditional techniques to prevent damage from ransomware attacks are to detect and block attacks by monitoring the known behaviors such as frequent name changes, recurring access to cryptographic libraries and exchange keys with remote servers. Unfortunately, intelligent ransomware can easily bypass these techniques. Another prevention technique is to recover from the backup copy when a file is infected with ransomware. However, the data backup technique requires extra storage space and can be removed with ransomware. In this paper, we propose to implement an access control mechanism on a disk drive, called a KEY-SSD disk drive. KEY-SSD is the data store and the last barrier to data protection. Unauthorized applications will not be able to read file data even if they bypass the file system defense, thus denying the block request without knowing the disk's registered block key and completely eliminating the possibility of the file becoming hostage to ransomware. We have prototyped KEY-SSD and validated the usefulness of KEY-SSD by demonstrating 1) selective block access control, 2) unauthorized data access blocking and 3) negligible performance overhead. Our comprehensive evaluation of KEY-SSD for various workloads show the KEY-SSD performance is hardly degraded due to OS lightweight key transmission and access control drive optimization. We also confirmed that KEY-SSD successfully protects the files in the actual ransomware sample.