Jihoon Yang

CV
6papers
203citations
Novelty52%
AI Score47

6 Papers

CVNov 21, 2022Code
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution

Haram Choi, Jeongmin Lee, Jihoon Yang

While some studies have proven that Swin Transformer (Swin) with window self-attention (WSA) is suitable for single image super-resolution (SR), the plain WSA ignores the broad regions when reconstructing high-resolution images due to a limited receptive field. In addition, many deep learning SR methods suffer from intensive computations. To address these problems, we introduce the N-Gram context to the low-level vision with Transformers for the first time. We define N-Gram as neighboring local windows in Swin, which differs from text analysis that views N-Gram as consecutive characters or words. N-Grams interact with each other by sliding-WSA, expanding the regions seen to restore degraded pixels. Using the N-Gram context, we propose NGswin, an efficient SR network with SCDP bottleneck taking multi-scale outputs of the hierarchical encoder. Experimental results show that NGswin achieves competitive performance while maintaining an efficient structure when compared with previous leading methods. Moreover, we also improve other Swin-based SR methods with the N-Gram context, thereby building an enhanced model: SwinIR-NG. Our improved SwinIR-NG outperforms the current best lightweight SR approaches and establishes state-of-the-art results. Codes are available at https://github.com/rami0205/NGramSwin.

CVApr 4, 2023Code
Exploration of Lightweight Single Image Denoising with Transformers and Truly Fair Training

Haram Choi, Cheolwoong Na, Jinseop Kim et al.

As multimedia content often contains noise from intrinsic defects of digital devices, image denoising is an important step for high-level vision recognition tasks. Although several studies have developed the denoising field employing advanced Transformers, these networks are too momory-intensive for real-world applications. Additionally, there is a lack of research on lightweight denosing (LWDN) with Transformers. To handle this, this work provides seven comparative baseline Transformers for LWDN, serving as a foundation for future research. We also demonstrate the parts of randomly cropped patches significantly affect the denoising performances during training. While previous studies have overlooked this aspect, we aim to train our baseline Transformers in a truly fair manner. Furthermore, we conduct empirical analyses of various components to determine the key considerations for constructing LWDN Transformers. Codes are available at https://github.com/rami0205/LWDN.

CVMay 19, 2023Code
Reciprocal Attention Mixing Transformer for Lightweight Image Restoration

Haram Choi, Cheolwoong Na, Jihyeon Oh et al.

Although many recent works have made advancements in the image restoration (IR) field, they often suffer from an excessive number of parameters. Another issue is that most Transformer-based IR methods focus only on either local or global features, leading to limited receptive fields or deficient parameter issues. To address these problems, we propose a lightweight IR network, Reciprocal Attention Mixing Transformer (RAMiT). It employs our proposed dimensional reciprocal attention mixing Transformer (D-RAMiT) blocks, which compute bi-dimensional (spatial and channel) self-attentions in parallel with different numbers of multi-heads. The bi-dimensional attentions help each other to complement their counterpart's drawbacks and are then mixed. Additionally, we introduce a hierarchical reciprocal attention mixing (H-RAMi) layer that compensates for pixel-level information losses and utilizes semantic information while maintaining an efficient hierarchical structure. Furthermore, we revisit and modify MobileNet V1 and V2 to attach efficient convolutions to our proposed components. The experimental results demonstrate that RAMiT achieves state-of-the-art performance on multiple lightweight IR tasks, including super-resolution, color denoising, grayscale denoising, low-light enhancement, and deraining. Codes are available at https://github.com/rami0205/RAMiT.

59.2DCApr 29
DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference

Bodon Jeong, Hongsu Byun, Youngjae Kim et al.

The increasing deployment of Large Language Model (LLM) inference on edge AI systems demands efficient execution under tight memory budgets. A key challenge arises from Key-Value (KV) caches, which often exceed available device memory. Although NVMe-based offloading offers scalable capacity, existing file-based designs rely heavily on the kernel page cache, leading to cache thrashing, unpredictable latency, and high software overhead under memory pressure. We present DUAL-BLADE, a dual-path KV residency framework that dynamically assigns KV tensors to either a page-cache path or an NVMe-direct path based on runtime memory availability. The NVMe-direct path bypasses the filesystem by mapping KV tensors to contiguous logical block address (LBA) regions, enabling low-overhead direct storage access. DUAL-BLADE further incorporates adaptive pipeline parallelism to overlap storage I/O with GPU DMA, improving inference throughput. Our evaluation shows that DUAL-BLADE substantially mitigates I/O bottlenecks, reducing prefill and decode latency by up to 33.1% and 42.4%, respectively, while improving SSD utilization by 2.2x across diverse memory budgets.

LGJun 17, 2020
FrostNet: Towards Quantization-Aware Network Architecture Search

Taehoon Kim, YoungJoon Yoo, Jihoon Yang

INT8 quantization has become one of the standard techniques for deploying convolutional neural networks (CNNs) on edge devices to reduce the memory and computational resource usages. By analyzing quantized performances of existing mobile-target network architectures, we can raise an issue regarding the importance of network architecture for optimal INT8 quantization. In this paper, we present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances. We first propose critical but straightforward optimization method which enables quantization-aware training (QAT) : floating-point statistic assisting (StatAssist) and stochastic gradient boosting (GradBoost). By integrating the gradient-based NAS with StatAssist and GradBoost, we discovered a quantization-efficient network building block, Frost bottleneck. Furthermore, we used Frost bottleneck as the building block for hardware-aware NAS to obtain quantization-efficient networks, FrostNets, which show improved quantization performances compared to other mobile-target networks while maintaining competitive FLOAT32 performance. Our FrostNets achieve higher recognition accuracy than existing CNNs with comparable latency when quantized, due to higher latency reduction rate (average 65%).

CLMay 20, 2018
Abstractive Text Classification Using Sequence-to-convolution Neural Networks

Taehoon Kim, Jihoon Yang

We propose a new deep neural network model and its training scheme for text classification. Our model Sequence-to-convolution Neural Networks(Seq2CNN) consists of two blocks: Sequential Block that summarizes input texts and Convolution Block that receives summary of input and classifies it to a label. Seq2CNN is trained end-to-end to classify various-length texts without preprocessing inputs into fixed length. We also present Gradual Weight Shift(GWS) method that stabilizes training. GWS is applied to our model's loss function. We compared our model with word-based TextCNN trained with different data preprocessing methods. We obtained significant improvement in classification accuracy over word-based TextCNN without any ensemble or data augmentation.