76.2LGMay 7
Retain-Neutral Surrogates for Min-Max UnlearningJunhao Cai, Dohun Kim, Dowon Kim et al.
Machine unlearning seeks to remove the influence of designated training data while preserving performance on the remaining data. Approximate unlearning can be viewed as a local editing problem; in min-max unlearning, the key local object is the surrogate point at which the retain objective is evaluated. When forget and retain gradients are strongly aligned, an unconstrained forget-maximizing perturbation can move to a surrogate point that increases retain loss. We propose Retain-Orthogonal Surrogate Unlearning (ROSU), which constrains the inner surrogate construction by maximizing first-order forget gain subject to zero first-order retain change under a fixed perturbation budget. This yields a closed-form retain-orthogonal perturbation, a lightweight transported outer update, and amplification along the retain-neutral direction. Our analysis establishes (i) a curvature-controlled second-order bound on retain damage, (ii) a positive-alignment regime in which ROSU strictly reduces surrogate retain loss relative to standard min-max perturbations, and (iii) near-equivalence when the two gradients are nearly orthogonal. Across vision and language benchmarks (CIFAR-10/100, Tiny-ImageNet, TOFU, WMDP), the empirical pattern follows this geometry: ROSU gives its clearest gains in high-coupling regimes while remaining competitive elsewhere.
AROct 31, 2025
Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU LimitsDowon Kim, MinJae Lee, Janghyeon Kim et al.
The expansion of context windows in large language models (LLMs) to multi-million tokens introduces severe memory and compute bottlenecks, particularly in managing the growing Key-Value (KV) cache. While Compute Express Link (CXL) enables non-eviction frameworks that offload the full KV-cache to scalable external memory, these frameworks still suffer from costly data transfers when recalling non-resident KV tokens to limited GPU memory as context lengths increase. This work proposes scalable Processing-Near-Memory (PNM) for 1M-Token LLM Inference, a CXL-enabled KV-cache management system that coordinates memory and computation beyond GPU limits. Our design offloads token page selection to a PNM accelerator within CXL memory, eliminating costly recalls and enabling larger GPU batch sizes. We further introduce a hybrid parallelization strategy and a steady-token selection mechanism to enhance compute efficiency and scalability. Implemented atop a state-of-the-art CXL-PNM system, our solution delivers consistent performance gains for LLMs with up to 405B parameters and 1M-token contexts. Our PNM-only offloading scheme (PNM-KV) and GPU-PNM hybrid with steady-token execution (PnG-KV) achieve up to 21.9x throughput improvement, up to 60x lower energy per token, and up to 7.3x better total cost efficiency than the baseline, demonstrating that CXL-enabled multi-PNM architectures can serve as a scalable backbone for future long-context LLM inference.
ARJan 21, 2025
Analysis of a Memcapacitor-Based for Neural Network Accelerator FrameworkAnkur Singh, Dowon Kim, Byung-Geun Lee
Data-intensive computing tasks, such as training neural networks, are crucial for artificial intelligence applications but often come with high energy demands. One promising solution is to develop specialized hardware that directly maps neural networks, utilizing arrays of memristive devices to perform parallel multiply-accumulate operations. In our research, we introduce a novel CMOS-based memcapacitor circuit that is validated using the cadence tool. Additionally, we developed the device in Python to facilitate the design of a memcapacitive-based accelerator. Our proposed framework employs a crossbar array of memcapacitor devices to train a neural network capable of digit classification and CIFAR dataset recognition. We tested the non-ideal characteristics of the constructed memcapacitor-based neural network. The system achieved an impressive 98.4% training accuracy in digit recognition and 94.4% training accuracy in CIFAR recognition, highlighting its effectiveness. This study demonstrates the potential of memcapacitor-based neural network systems in handling classification tasks and sets the stage for further advancements in neuromorphic computing.
NCNov 25, 2024
Swin fMRI Transformer Predicts Early Neurodevelopmental Outcomes from Neonatal fMRIPatrick Styll, Dowon Kim, Jiook Cha
Brain development in the first few months of human life is a critical phase characterized by rapid structural growth and functional organization. Accurately predicting developmental outcomes during this time is crucial for identifying delays and enabling timely interventions. This study introduces the SwiFT (Swin 4D fMRI Transformer) model, designed to predict Bayley-III composite scores using neonatal fMRI from the Developing Human Connectome Project (dHCP). To enhance predictive accuracy, we apply dimensionality reduction via group independent component analysis (ICA) and pretrain SwiFT on large adult fMRI datasets to address the challenges of limited neonatal data. Our analysis shows that SwiFT significantly outperforms baseline models in predicting cognitive, motor, and language outcomes, leveraging both single-label and multi-label prediction strategies. The model's attention-based architecture processes spatiotemporal data end-to-end, delivering superior predictive performance. Additionally, we use Integrated Gradients with Smoothgrad sQuare (IG-SQ) to interpret predictions, identifying neural spatial representations linked to early cognitive and behavioral development. These findings underscore the potential of Transformer models to advance neurodevelopmental research and clinical practice.