KnowsLM: A framework for evaluation of small language models for knowledge augmentation and humanised conversations
This work addresses the problem of balancing knowledge and style in conversational AI for developers using small language models, but it is incremental as it compares existing methods like fine-tuning and RAG.
This study tackled the challenge of generating human-like dialogue with small language models by evaluating how LoRA rank, dataset scale, and prompt design affect knowledge retention and stylistic alignment, finding that fine-tuning improves fluency and style but struggles with unseen knowledge, while RAG enhances factual accuracy but lacks stylistic consistency.
In the evolving landscape of conversational AI, generating concise, context-aware, and human-like dialogue using small and medium-sized language models (LLMs) remains a complex challenge. This study investigates the influence of LoRA rank, dataset scale, and prompt prefix design on both knowledge retention and stylistic alignment. While fine-tuning improves fluency and enables stylistic customization, its ability to integrate unseen knowledge is constrained -- particularly with smaller datasets. Conversely, RAG-augmented models, equipped to incorporate external documents at inference, demonstrated superior factual accuracy on out-of-distribution prompts, though they lacked the stylistic consistency achieved by fine-tuning. Evaluations by LLM-based judges across knowledge accuracy, conversational quality, and conciseness suggest that fine-tuning is best suited for tone adaptation, whereas RAG excels at real-time knowledge augmentation.