Jongin Choi

AR
h-index6
3papers
1citation
Novelty52%
AI Score43

3 Papers

ARMay 20
CMAX-CAMEL: A Coarse-to-Fine Adaptive, Memory-Efficient, and Low-Power Edge Processor for Contrast Maximization

Kyeongpil Min, Jongin Choi, Kyeongwon Lee et al.

Contrast maximization (CMAX) is a direct geometric framework for event-based motion estimation, but its iterative warp-and-accumulate pipeline incurs input-dependent computation and frequent memory accesses, challenging real-time, low-power edge deployment. We present CMAX-CAMEL, a coarse-to-fine adaptive, memory-efficient, low-power edge processor for CMAX. CMAX-CAMEL combines a runtime-adaptive execution strategy with a memory-centric processor architecture. It adjusts coarse-to-fine execution according to the observed event distribution, prioritizing stages likely to improve estimation accuracy while suppressing low-value iterations and unnecessary stage transitions. Architecturally, a banked parallel memory organization sustains real-time throughput while reducing latency, and a subsampling-coupled accumulation structure lowers memory-access activity along the warp-and-accumulate dataflow. On a Virtex FPGA prototype operating at 200 MHz, CMAX-CAMEL improves estimation accuracy by up to 19% over fixed coarse-to-fine schedules, reduces processing latency by 53.3%, lowers effective memory accesses by 42%, and cuts total system energy by 52.2%, including adaptation overheads. These results show that CMAX-CAMEL is an HW-SW co-design that co-optimizes execution policy and data movement for real-time, low-power event-based motion estimation at the edge.

ARMay 20
SA-Kura: An Energy-Efficient Systolic Array Accelerator for Locally-Coupled Kuramoto Drift in Diffusion Sampling

Jeongmin Jin, Kyeongwon Lee, Mundo Jeong et al.

Diffusion inference remains costly for edge deployment, yet existing accelerators focus almost exclusively on score networks because standard drift is merely a trivial linear scaling. Kuramoto orientation diffusion replaces this trivial drift with locally coupled phase interactions, improving sampling efficiency but introducing a new hardware bottleneck: a center-dependent nonlinear 5 x 5 stencil evaluated at every reverse step. This kernel maps poorly to conventional CNN accelerators and matrix-oriented engines. We present SA-Kura, to our knowledge the first digital systolic-array accelerator dedicated to locally coupled Kuramoto drift. By reformulating pair-wise sinusoidal coupling into neighbor accumulation independent of the center phase followed by a single center-dependent multiply-subtract combination, SA-Kura eliminates in-PE transcendental units and enables regular systolic execution with register-level reuse. SA-Kura was implemented in synthesizable RTL, integrated into a lightweight RISC-V-based SoC, prototyped on FPGA, and evaluated through 45 nm CMOS synthesis and power analysis. For the drift kernel only, compared with software execution of the same kernel on the processor core in the same SoC platform, SA-Kura reduces latency and energy by 193x and 69.4x, respectively. Compared with a standalone Jetson Orin Nano CUDA implementation of the same kernel, it is 6.57x faster and achieves approximately 46.0x lower energy per pixel.

LGNov 6, 2025
FiCABU: A Fisher-Based, Context-Adaptive Machine Unlearning Processor for Edge AI

Eun-Su Cho, Jongin Choi, Jeongmin Jin et al.

Machine unlearning, driven by privacy regulations and the "right to be forgotten", is increasingly needed at the edge, yet server-centric or retraining-heavy methods are impractical under tight computation and energy budgets. We present FiCABU (Fisher-based Context-Adaptive Balanced Unlearning), a software-hardware co-design that brings unlearning to edge AI processors. FiCABU combines (i) Context-Adaptive Unlearning, which begins edits from back-end layers and halts once the target forgetting is reached, with (ii) Balanced Dampening, which scales dampening strength by depth to preserve retain accuracy. These methods are realized in a full RTL design of a RISC-V edge AI processor that integrates two lightweight IPs for Fisher estimation and dampening into a GEMM-centric streaming pipeline, validated on an FPGA prototype and synthesized in 45 nm for power analysis. Across CIFAR-20 and PinsFaceRecognition with ResNet-18 and ViT, FiCABU achieves random-guess forget accuracy while matching the retraining-free Selective Synaptic Dampening (SSD) baseline on retain accuracy, reducing computation by up to 87.52 percent (ResNet-18) and 71.03 percent (ViT). On the INT8 hardware prototype, FiCABU further improves retain preservation and reduces energy to 6.48 percent (CIFAR-20) and 0.13 percent (PinsFaceRecognition) of the SSD baseline. In sum, FiCABU demonstrates that back-end-first, depth-aware unlearning can be made both practical and efficient for resource-constrained edge AI devices.