Brent ByungHoon Kang

CR
h-index9
9papers
33citations
Novelty62%
AI Score53

9 Papers

CLFeb 2
Zero2Text: Zero-Training Cross-Domain Inversion Attacks on Textual Embeddings

Doohyun Kim, Donghwa Kang, Kyungjae Lee et al.

The proliferation of retrieval-augmented generation (RAG) has established vector databases as critical infrastructure, yet they introduce severe privacy risks via embedding inversion attacks. Existing paradigms face a fundamental trade-off: optimization-based methods require computationally prohibitive queries, while alignment-based approaches hinge on the unrealistic assumption of accessible in-domain training data. These constraints render them ineffective in strict black-box and cross-domain settings. To dismantle these barriers, we introduce Zero2Text, a novel training-free framework based on recursive online alignment. Unlike methods relying on static datasets, Zero2Text synergizes LLM priors with a dynamic ridge regression mechanism to iteratively align generation to the target embedding on-the-fly. We further demonstrate that standard defenses, such as differential privacy, fail to effectively mitigate this adaptive threat. Extensive experiments across diverse benchmarks validate Zero2Text; notably, on MS MARCO against the OpenAI victim model, it achieves 1.8x higher ROUGE-L and 6.4x higher BLEU-2 scores compared to baselines, recovering sentences from unknown domains without a single leaked data pair.

CVMar 6
CR-QAT: Curriculum Relational Quantization-Aware Training for Open-Vocabulary Object Detection

Jinyeong Park, Donghwa Kim, Brent ByungHoon Kang et al.

Open-vocabulary object detection (OVOD) enables novel category detection via vision-language alignment, but massive model sizes hinder deployment on resource-constrained devices. While quantization offers practical compression, we reveal that naive extreme low-bit (e.g., 4-bit) quantization severely degrades fine-grained vision-language alignment and distorts inter-region relational structures. To address this, we propose curriculum relational quantization-aware training (CR-QAT), an integrated framework combining stage-by-stage optimization with relational knowledge distillation. Within CR-QAT, curriculum QAT (CQAT) mitigates error accumulation by partitioning the model for progressive quantization, ensuring stable optimization via error isolation. Concurrently, text-centric relational KD (TRKD) is applied to task-relevant modules. By constructing text-anchored pairwise similarity matrices, TRKD comprehensively transfers the teacher's multi-dimensional relational knowledge. Experiments on LVIS and COCO zero-shot benchmarks demonstrate that CR-QAT consistently outperforms existing QAT baselines under aggressive low-bit settings, achieving relative AP improvements of up to 38.9% and 40.9%, respectively.

CRMar 6
SPOILER: TEE-Shielded DNN Partitioning of On-Device Secure Inference with Poison Learning

Donghwa Kang, Hojun Choe, Doohyun Kim et al.

Deploying deep neural networks (DNNs) on edge devices exposes valuable intellectual property to model-stealing attacks. While TEE-shielded DNN partitioning (TSDP) mitigates this by isolating sensitive computations, existing paradigms fail to simultaneously satisfy privacy and efficiency. The training-before-partition paradigm suffers from intrinsic privacy leakage, whereas the partition-before-training paradigm incurs severe latency due to structural dependencies that hinder parallel execution. To overcome these limitations, we propose SPOILER, a novel search-before-training framework that fundamentally decouples the TEE sub-network from the backbone via hardware-aware neural architecture search (NAS). SPOILER identifies a lightweight TEE architecture strictly optimized for hardware constraints, maximizing parallel efficiency. Furthermore, we introduce self-poisoning learning to enforce logical isolation, rendering the exposed backbone functionally incoherent without the TEE component. Extensive experiments on CNNs and Transformers demonstrate that SPOILER achieves state-of-the-art trade-offs between security, latency, and accuracy.

LGAug 19, 2025
STAS: Spatio-Temporal Adaptive Computation Time for Spiking Transformers

Donghwa Kang, Doohyun Kim, Sang-Ki Ko et al.

Spiking neural networks (SNNs) offer energy efficiency over artificial neural networks (ANNs) but suffer from high latency and computational overhead due to their multi-timestep operational nature. While various dynamic computation methods have been developed to mitigate this by targeting spatial, temporal, or architecture-specific redundancies, they remain fragmented. While the principles of adaptive computation time (ACT) offer a robust foundation for a unified approach, its application to SNN-based vision Transformers (ViTs) is hindered by two core issues: the violation of its temporal similarity prerequisite and a static architecture fundamentally unsuited for its principles. To address these challenges, we propose STAS (Spatio-Temporal Adaptive computation time for Spiking transformers), a framework that co-designs the static architecture and dynamic computation policy. STAS introduces an integrated spike patch splitting (I-SPS) module to establish temporal stability by creating a unified input representation, thereby solving the architectural problem of temporal dissimilarity. This stability, in turn, allows our adaptive spiking self-attention (A-SSA) module to perform two-dimensional token pruning across both spatial and temporal axes. Implemented on spiking Transformer architectures and validated on CIFAR-10, CIFAR-100, and ImageNet, STAS reduces energy consumption by up to 45.9%, 43.8%, and 30.1%, respectively, while simultaneously improving accuracy over SOTA models.

CVAug 19, 2025
Timestep-Compressed Attack on Spiking Neural Networks through Timestep-Level Backpropagation

Donghwa Kang, Doohyun Kim, Sang-Ki Ko et al.

State-of-the-art (SOTA) gradient-based adversarial attacks on spiking neural networks (SNNs), which largely rely on extending FGSM and PGD frameworks, face a critical limitation: substantial attack latency from multi-timestep processing, rendering them infeasible for practical real-time applications. This inefficiency stems from their design as direct extensions of ANN paradigms, which fail to exploit key SNN properties. In this paper, we propose the timestep-compressed attack (TCA), a novel framework that significantly reduces attack latency. TCA introduces two components founded on key insights into SNN behavior. First, timestep-level backpropagation (TLBP) is based on our finding that global temporal information in backpropagation to generate perturbations is not critical for an attack's success, enabling per-timestep evaluation for early stopping. Second, adversarial membrane potential reuse (A-MPR) is motivated by the observation that initial timesteps are inefficiently spent accumulating membrane potential, a warm-up phase that can be pre-calculated and reused. Our experiments on VGG-11 and ResNet-17 with the CIFAR-10/100 and CIFAR10-DVS datasets show that TCA significantly reduces the required attack latency by up to 56.6% and 57.1% compared to SOTA methods in white-box and black-box settings, respectively, while maintaining a comparable attack success rate.

SYMay 29, 2025
CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection

Woojin Shin, Donghwa Kang, Byeongyun Park et al.

Detection Transformers (DETR) are increasingly adopted in autonomous vehicle (AV) perception systems due to their superior accuracy over convolutional networks. However, concurrently executing multiple DETR tasks presents significant challenges in meeting firm real-time deadlines (R1) and high accuracy requirements (R2), particularly for safety-critical objects, while navigating the inherent latency-accuracy trade-off under resource constraints. Existing real-time DNN scheduling approaches often treat models generically, failing to leverage Transformer-specific properties for efficient resource allocation. To address these challenges, we propose CF-DETR, an integrated system featuring a novel coarse-to-fine Transformer architecture and a dedicated real-time scheduling framework NPFP**. CF-DETR employs three key strategies (A1: coarse-to-fine inference, A2: selective fine inference, A3: multi-level batch inference) that exploit Transformer properties to dynamically adjust patch granularity and attention scope based on object criticality, aiming to satisfy R2. The NPFP** scheduling framework (A4) orchestrates these adaptive mechanisms A1-A3. It partitions each DETR task into a safety-critical coarse subtask for guaranteed critical object detection within its deadline (ensuring R1), and an optional fine subtask for enhanced overall accuracy (R2), while managing individual and batched execution. Our extensive evaluations on server, GPU-enabled embedded platforms, and actual AV platforms demonstrate that CF-DETR, under an NPFP** policy, successfully meets strict timing guarantees for critical operations and achieves significantly higher overall and critical object detection accuracy compared to existing baselines across diverse AV workloads.

SYJan 29, 2025
Real Time Scheduling Framework for Multi Object Detection via Spiking Neural Networks

Donghwa Kang, Woojin Shin, Cheol-Ho Hong et al.

Given the energy constraints in autonomous mobile agents (AMAs), such as unmanned vehicles, spiking neural networks (SNNs) are increasingly favored as a more efficient alternative to traditional artificial neural networks. AMAs employ multi-object detection (MOD) from multiple cameras to identify nearby objects while ensuring two essential objectives, (R1) timing guarantee and (R2) high accuracy for safety. In this paper, we propose RT-SNN, the first system design, aiming at achieving R1 and R2 in SNN-based MOD systems on AMAs. Leveraging the characteristic that SNNs gather feature data of input image termed as membrane potential, through iterative computation over multiple timesteps, RT-SNN provides multiple execution options with adjustable timesteps and a novel method for reusing membrane potential to support R1. Then, it captures how these execution strategies influence R2 by introducing a novel notion of mean absolute error and membrane confidence. Further, RT-SNN develops a new scheduling framework consisting of offline schedulability analysis for R1 and a run-time scheduling algorithm for R2 using the notion of membrane confidence. We deployed RT-SNN to Spiking-YOLO, the SNN-based MOD model derived from ANN-to-SNN conversion, and our experimental evaluation confirms its effectiveness in meeting the R1 and R2 requirements while providing significant energy efficiency.

CRJul 3, 2018
Rethinking Misalignment to Raise the Bar for Heap Pointer Corruption

Daehee Jang, Jonghwan Kim, Minjoon Park et al.

Heap layout randomization renders a good portion of heap vulnerabilities unexploitable. However, some remnants of the vulnerabilities are still exploitable even under the randomized layout. According to our analysis, such heap exploits often abuse pointer-width allocation granularity to spray crafted pointers. To address this problem, we explore the efficacy of byte-granularity (the most fine-grained) heap randomization. Heap randomization, in general, has been a well-trodden area; however, the efficacy of byte-granularity randomization has never been fully explored as \emph{misalignment} raises various concerns. This paper unravels the pros and cons of byte-granularity heap randomization by conducting comprehensive analysis in three folds: (i) security effectiveness, (ii) performance impact, and (iii) compatibility analysis to measure deployment cost. Security discussion based on 20 CVE case studies suggests that byte-granularity heap randomization raises the bar against heap exploits more than we initially expected; as pointer spraying approach is becoming prevalent in modern heap exploits. Afterward, to demystify the skeptical concerns regarding misalignment, we conduct cycle-level microbenchmarks and report that the performance cost is highly concentrated to edge cases depending on L1-cache line. Based on such observations, we design and implement an allocator suited to optimize the performance cost of byte-granularity heap randomization; then evaluate the performance with the memory-intensive benchmark (SPEC2006). Finally, we discuss compatibility issues using Coreutils, Nginx, and ChakraCore.

CRMay 30, 2018
Lord of the x86 Rings: A Portable User Mode Privilege Separation Architecture on x86

Hojoon Lee, Chihyun Song, Brent Byunghoon Kang

Modern applications are increasingly advanced and complex, and inevitably contain exploitable software bugs despite the ongoing efforts. The applications today often involve processing of sensitive information. However, the lack of privilege separation within the user space leaves sensitive application secret such as cryptographic keys just as unprotected as a "hello world" string. Cutting-edge hardware-supported security features are being introduced. However, the features are often vendor-specific or lack compatibility with older generations of the processors. The situation leaves developers with no portable solution to incorporate protection for the sensitive application component. We propose LOTRx86, a fundamental and portable approach for user space privilege separation. Our approach creates a more privileged user execution layer called PrivUser through harnessing the underused intermediate privilege levels on the x86 architecture. The PrivUser memory space, a set of pages within process address space that are inaccessible to user mode, is a safe place for application secrets and routines that access them. We implement the LOTRx86 ABI that exports the privilege-based, accessing the protected application secret only requires a change in the privilege, eliminating the need for costly remote procedure calls or change in address space. We evaluated our platform by developing a proof-of-concept LOTRx86-enabled web server that employs our architecture to securely access its private key during SSL connection and thereby mitigating the HeartBleed vulnerability by design. We conducted a set of experiments including a performance measurement on the PoC on both Intel and AMD PCs, and confirmed that LOTRx86 incurs only a limited performance overhead.