SYMay 21, 2018
Detection of Sensor Attack and Resilient State Estimation for Uniformly Observable Nonlinear Systems having Redundant SensorsJunsoo Kim, Chanhwa Lee, Hyungbo Shim et al.
This paper presents a detection algorithm for sensor attacks and a resilient state estimation scheme for a class of uniformly observable nonlinear systems. An adversary is supposed to corrupt a subset of sensors with the possibly unbounded signals, while the system has sensor redundancy. We design an individual high-gain observer for each measurement output so that only the observable portion of the system state is obtained. Then, a nonlinear error correcting problem is solved by collecting all the information from those partial observers and exploiting redundancy. A computationally efficient, on-line monitoring scheme is presented for attack detection. Based on the attack detection scheme, an algorithm for resilient state estimation is provided. The simulation results demonstrate the effectiveness of the proposed algorithm.
13.0SYMay 21
A Learning With Errors based encryption scheme for dynamic controllers that discloses residue signal for anomaly detectionYeongjun Jang, Joowon Lee, Junsoo Kim et al.
Although encrypted control systems ensure confidentiality of private data, it is challenging to detect anomalies without the secret key as all signals remain encrypted. To address this issue, we propose a homomorphic encryption scheme for dynamic controllers that automatically discloses the residue signal for anomaly detection, while keeping all other signals private. To this end, we characterize the zero-dynamics of an encrypted dynamic system over a finite field of integers and incorporate it into a Learning With Errors (LWE) based scheme. We then present a method to further utilize the disclosed residue signal for implementing dynamic controllers over encrypted data, which does not involve re-encryption even when they have non-integer state matrices.
SYSep 22, 2022
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text GenerationSeongmin Hong, Seungjae Moon, Junsoo Kim et al.
Transformer is a deep learning language model widely used for natural language processing (NLP) services in datacenters. Among transformer models, Generative Pre-trained Transformer (GPT) has achieved remarkable performance in text generation, or natural language generation (NLG), which needs the processing of a large input context in the summarization stage, followed by the generation stage that produces a single word at a time. The conventional platforms such as GPU are specialized for the parallel processing of large inputs in the summarization stage, but their performance significantly degrades in the generation stage due to its sequential characteristic. Therefore, an efficient hardware platform is required to address the high latency caused by the sequential characteristic of text generation. In this paper, we present DFX, a multi-FPGA acceleration appliance that executes GPT-2 model inference end-to-end with low latency and high throughput in both summarization and generation stages. DFX uses model parallelism and optimized dataflow that is model-and-hardware-aware for fast simultaneous workload execution among devices. Its compute cores operate on custom instructions and provide GPT-2 operations end-to-end. We implement the proposed hardware architecture on four Xilinx Alveo U280 FPGAs and utilize all of the channels of the high bandwidth memory (HBM) and the maximum number of compute resources for high hardware efficiency. DFX achieves 5.58x speedup and 3.99x energy efficiency over four NVIDIA V100 GPUs on the modern GPT-2 model. DFX is also 8.21x more cost-effective than the GPU appliance, suggesting that it is a promising solution for text generation workloads in cloud datacenters.
88.2SYMay 19
Sensor Attack Detection Method for Encrypted State ObserversYeongjun Jang, Sangwon Lee, Junsoo Kim
This paper proposes an encrypted state observer that is capable of detecting sensor attacks without decryption. We first design a state observer that operates over a finite field of integers with the modular arithmetic. The observer generates a residue signal that indicates the presence of attacks under sparse attack and sensing redundancy conditions. Then, we develop a homomorphic encryption scheme that enables the observer to operate over encrypted data while automatically disclosing the residue signal. Unlike our previous work restricted to single-input single-output systems, the proposed scheme is applicable to general multi-input multi-output systems. Given that the disclosed residue signal remains below a prescribed threshold, the full state can be recovered as an encrypted message.
21.6SYMar 19
A Distributionally Robust Optimal Control Approach for Differentially Private Dynamical SystemsYeongjun Jang, Kaoru Teranishi, Junsoo Kim
In this paper, we develop a distributionally robust optimal control approach for differentially private dynamical systems, enabling a plant to securely outsource control computation to an untrusted remote server. We consider a plant that ensures differential privacy of its state trajectory by injecting calibrated noise into its output measurements. Unlike prior works, we assume that the server only has access to an ambiguity set consisting of admissible noise distributions, rather than the exact distribution. To account for this uncertainty, the server formulates a distributionally robust optimal control problem to minimize the worst-case expected cost over all admissible noise distributions. However, the formulated problem is computationally intractable due to the nonconvexity of the ambiguity set. To overcome this, we relax it into a convex Kullback--Leibler divergence ball, so that the reformulated problem admits a tractable closed-form solution.
44.0SYMar 19
Variational Encrypted Model Predictive ControlJihoon Suh, Yeongjun Jang, Junsoo Kim et al.
We develop a variational encrypted model predictive control (VEMPC) protocol whose online execution relies only on encrypted polynomial operations. The proposed approach reformulates the MPC problem into a sampling-based estimator, in which the computation of the quadratic cost is naturally handled by tilting the sampling distribution, thus reducing online encrypted computation. The resulting protocol requires no additional communication rounds or intermediate decryption, and scales efficiently through two complementary levels of parallelism. We analyze the effect of encryption-induced errors on optimality, and simulation results demonstrate the practical applicability of the proposed method.
ARMar 24, 2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache QuantizationMinsu Kim, Seongmin Hong, RyeoWook Ko et al.
Modern Large Language Model serving system batches multiple requests to achieve high throughput, while batching attention operations is challenging, rendering memory bandwidth a critical bottleneck. The community relies on high-end GPUs with multiple high-bandwidth memory channels. Unfortunately, HBM's high bandwidth often comes at the expense of limited memory capacity, which reduces core utilization and increases costs. Recent advancements enabling longer contexts for LLMs have substantially increased the key-value cache size, further intensifying the pressures on memory capacity. The literature has explored KV cache quantization techniques, which commonly use low bitwidth for most values, selectively using higher bitwidth for outlier values. While this approach helps achieve high accuracy and low bitwidth simultaneously, it comes with the limitation that cost for online outlier detection is excessively high, negating the advantages. We propose Oaken, an acceleration solution that achieves high accuracy and high performance simultaneously through co-designing algorithm and hardware. To effectively find a sweet spot in the accuracy-performance trade-off space of KV cache quantization, Oaken employs an online-offline hybrid approach, setting outlier thresholds offline, which are then used to determine the quantization scale online. To translate the proposed algorithmic technique into tangible performance gains, Oaken also comes with custom quantization engines and memory management units that can be integrated with any LLM accelerators. We built an Oaken accelerator on top of an LLM accelerator, LPU, and conducted a comprehensive evaluation. Our experiments show that for a batch size of 256, Oaken achieves up to 1.58x throughput improvement over NVIDIA A100 GPU, incurring a minimal accuracy loss of only 0.54\% on average, compared to state-of-the-art KV cache quantization techniques.
CVApr 1, 2021
LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of Dynamic AgentsByeoungDo Kim, Seong Hyeon Park, Seokhwan Lee et al.
In this paper, we address the problem of predicting the future motion of a dynamic agent (called a target agent) given its current and past states as well as the information on its environment. It is paramount to develop a prediction model that can exploit the contextual information in both static and dynamic environments surrounding the target agent and generate diverse trajectory samples that are meaningful in a traffic context. We propose a novel prediction model, referred to as the lane-aware prediction (LaPred) network, which uses the instance-level lane entities extracted from a semantic map to predict the multi-modal future trajectories. For each lane candidate found in the neighborhood of the target agent, LaPred extracts the joint features relating the lane and the trajectories of the neighboring agents. Then, the features for all lane candidates are fused with the attention weights learned through a self-supervised learning task that identifies the lane candidate likely to be followed by the target agent. Using the instance-level lane information, LaPred can produce the trajectories compliant with the surroundings better than 2D raster image-based methods and generate the diverse future trajectories given multiple lane candidates. The experiments conducted on the public nuScenes dataset and Argoverse dataset demonstrate that the proposed LaPred method significantly outperforms the existing prediction models, achieving state-of-the-art performance in the benchmarks.
SYApr 17, 2019
Comprehensive Introduction to Fully Homomorphic Encryption for Dynamic Feedback Controller via LWE-based CryptosystemJunsoo Kim, Hyungbo Shim, Kyoohyung Han
The cryptosystem based on the Learning-with-Errors (LWE) problem is considered as a post-quantum cryptosystem, because it is not based on the factoring problem with large primes which is easily solved by a quantum computer. Moreover, the LWE-based cryptosystem allows fully homomorphic arithmetics so that two encrypted variables can be added and multiplied without decrypting them. This chapter provides a comprehensive introduction to the LWE-based cryptosystem with examples. A key to the security of the LWE-based cryptosystem is the injection of random errors in the ciphertexts, which however hinders unlimited recursive operation of homomorphic arithmetics on ciphertexts due to the growth of the error. We show that this limitation can be overcome when the cryptosystem is used for a dynamic feedback controller that guarantees stability of the closed-loop system. Finally, we illustrate through MATLAB codes how the LWE-based cryptosystem can be customized to build a secure feedback control system. This chapter is written for the control engineers who do not have background on cryptosystems.