From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG
For developers of on-device personal AI agents, EPIC enables efficient preference-aligned retrieval under tight memory budgets, solving a critical bottleneck in deploying LLMs on edge devices.
EPIC addresses the memory bottleneck in on-device RAG by constructing a compact index focused on user preferences, reducing indexing memory by 2,404x, improving preference-following accuracy by 20.17 percentage points, and achieving 33.33x lower retrieval latency over baselines.
With the rapid emergence of personal AI agents based on Large Language Models (LLMs), implementing them on-device has become essential for privacy and responsiveness. To handle the inherently personal and context-dependent nature of real-world requests, such agents must ground their generation in device-resident personal context. However, under tight memory budgets, the core bottleneck is what to store so that retrieval remains aligned with the user. We propose EPIC (Efficient Preference-aligned Index Construction), which focuses on user preferences as a compact and stable form of personal context and integrates them throughout the RAG pipeline. EPIC selectively retains preference-relevant information from raw data and aligns retrieval toward preference-aligned contexts. Across four benchmarks covering conversations, debates, explanations, and recommendations, EPIC reduces indexing memory by 2,404 times, improves preference-following accuracy by 20.17 percentage points, and achieves 33.33 times lower retrieval latency over the best-performing baseline. In our on-device experiment, EPIC maintains a memory footprint under 1 MB with 29.35 ms/query latency in streaming updates.