Changmin Shin

2papers

2 Papers

85.0ARApr 6Code
LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM

Junguk Hong, Changmin Shin, Sukjin Kim et al.

Lookup tables (LUTs) have recently gained attention as an alternative compute mechanism that maps input operands to precomputed results, eliminating the need for arithmetic logic. LUTs not only reduce logic complexity, but also naturally support diverse numerical precisions without requiring separate circuits for each bitwidth-an increasingly important feature in quantized DNNs. This creates a favorable tradeoff in PIM: memory capacity can be used in place of logic to increase computational throughput, aligning well with DRAM-PIM architectures that offer high bandwidth and easily available memory but limited logic density. In this work, we explore this capacity-computation tradeoff in LUT-based PIM designs, where memory capacity is traded for performance by packing multiple MAC operations into a single LUT lookup. Building on this insight, we propose LOCALUT, a PIM-based design for efficient low-bit quantized DNN inference using operation-packed LUTs. First, we observe that these LUTs contain extensive redundancy and introduce LUT canonicalization, which eliminates duplicate entries to reduce LUT size. Second, we propose reordering LUT, a lightweight auxiliary LUT that remaps weight vectors to their canonical form required by LUT canonicalization with a simple LUT lookup. Third, we propose LUT slice streaming, a novel execution strategy that exploits the DRAM-buffer hierarchy by streaming only relevant LUT columns into the buffer and reusing them across multiple weight vectors. Evaluated on a real system based on UPMEM devices, we demonstrate a geometric mean speedup of 1.82x across various numeric precisions and DNN models. We believe LOCALUT opens a path toward scalable, low-logic PIM designs tailored for LUT-based DNN inference. Our implementation of LOCALUT is available at https://github.com/AIS-SNU/LoCaLUT.

61.6DCMay 12
NAVIS: Concurrent Search and Update with Low Position-Seeking Overhead in On-SSD Graph-Based Vector Search

Jaeyong Song, Hongsun Jang, Changmin Shin et al.

On-disk graph-based vector search (GVS) has become the dominant approach for serving large-scale vector databases at high recall, but prior systems struggle to sustain concurrent search and update throughput on high-dimensional workloads. We find the main cause of this in position seeking, a full graph traversal that every update performs to locate neighbors before linking the new vector into the graph. Position seeking is fundamentally heavier than a search query, and its cost is further amplified by two systemic limitations of current GVS systems, packed layouts that couple every edge fetch to a full vector load, and a static entrance graph whose entry points drift away from newly inserted regions as updates accumulate. We present NAVIS, an on-SSD GVS system that drives down position-seeking overhead through (i) a layout-supported selective vector read that breaks the packed-page coupling without losing its locality benefits, (ii) a dynamic lightweight entrance graph update mechanism that reuses traversal information already produced by concurrent updates, and (iii) an entrance graph-aware edgelist cache that concentrates capacity on high-reuse paths near refreshed entry points. Across multiple large-scale high-dimensional benchmarks, NAVIS enhances average insertion throughput by up to 2.74x and average concurrent search throughput by up to 1.37x while reducing average search latency by up to 25.26%.