Fixed-Persona SLMs with Modular Memory: Scalable NPC Dialogue on Consumer Hardware
This addresses the need for efficient, persona-driven dialogue systems in gaming and other domains like virtual assistants, though it is incremental in applying existing SLMs with a novel modular memory design.
The paper tackles the problem of using large language models for NPC dialogue in games by proposing a modular system with small language models and swappable memory modules, achieving scalable and memory-rich interactions on consumer hardware without retraining or reloading during gameplay.
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, yet their applicability to dialogue systems in computer games remains limited. This limitation arises from their substantial hardware requirements, latency constraints, and the necessity to maintain clearly defined knowledge boundaries within a game setting. In this paper, we propose a modular NPC dialogue system that leverages Small Language Models (SLMs), fine-tuned to encode specific NPC personas and integrated with runtime-swappable memory modules. These memory modules preserve character-specific conversational context and world knowledge, enabling expressive interactions and long-term memory without retraining or model reloading during gameplay. We comprehensively evaluate our system using three open-source SLMs: DistilGPT-2, TinyLlama-1.1B-Chat, and Mistral-7B-Instruct, trained on synthetic persona-aligned data and benchmarked on consumer-grade hardware. While our approach is motivated by applications in gaming, its modular design and persona-driven memory architecture hold significant potential for broader adoption in domains requiring expressive, scalable, and memory-rich conversational agents, such as virtual assistants, customer support bots, or interactive educational systems.