Kyoungmin Kim

DB
h-index6
3papers
7citations
Novelty32%
AI Score32

3 Papers

25.2DBMar 17
Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join

Kyoungmin Kim, Lennart Roth, Liang Liang et al.

Vector joins - finding all vector pairs between a set of query and data vectors whose distances are below a given threshold - are fundamental to modern vector and vector-relational database systems that power multimodal retrieval and semantic analytics. Existing state-of-the-art approach exploits work sharing among similar queries but still suffers from redundant index traversals and excessive distance computations. We propose a unified framework for efficient approximate vector joins that (1) introduces soft work sharing to reuse traversal results beyond the join results of previous queries, (2) builds a merged index over both query and data vectors to further speedup graph explorations, and (3) improves robustness for out-of-distribution queries through an adaptive hybrid search strategy. Experiments on eight datasets demonstrate substantial improvements in efficiency-recall trade-off over the state of the art.

PFNov 12, 2024
Faster LLM Inference using DBMS-Inspired Preemption and Cache Replacement Policies

Kyoungmin Kim, Jiacheng Li, Kijae Hong et al.

LLMs are increasingly used world-wide from daily tasks to agentic systems and data analytics, requiring significant GPU resources. LLM inference systems, however, are slow compared to database systems, and inference performance and mechanism have been often regarded as a black box, limiting the expansion of the use of LLMs inside databases and other performance-critical applications. This paper first analyzes the LLM inference performance and focuses on a data management issue inside LLM inference. We find that inference systems lack an adequate resource cost model and optimization strategy to schedule requests with their intermediate results in a cache reside in GPU memory when executing multiple concurrent inference requests. We adapt classic database techniques by building cost models for concurrent inference requests and a new cache replacement policy tailored for LLM inference, which can substantially save GPU costs.

DBDec 23, 2024
Trustworthy and Efficient LLMs Meet Databases

Kyoungmin Kim, Anastasia Ailamaki

In the rapidly evolving AI era with large language models (LLMs) at the core, making LLMs more trustworthy and efficient, especially in output generation (inference), has gained significant attention. This is to reduce plausible but faulty LLM outputs (a.k.a hallucinations) and meet the highly increased inference demands. This tutorial explores such efforts and makes them transparent to the database community. Understanding these efforts is essential in harnessing LLMs in database tasks and adapting database techniques to LLMs. Furthermore, we delve into the synergy between LLMs and databases, highlighting new opportunities and challenges in their intersection. This tutorial aims to share with database researchers and practitioners essential concepts and strategies around LLMs, reduce the unfamiliarity of LLMs, and inspire joining in the intersection between LLMs and databases.