On-Device LLMs for Home Assistant: Dual Role in Intent Detection and Response Generation
It addresses the need for unified, on-device AI in home automation without specialized hardware, though it is incremental in applying existing quantization techniques to this domain.
This paper tackled the problem of using fine-tuned Large Language Models (LLMs) for both slot/intent detection and response generation in smart home assistants on resource-limited edge hardware, achieving around 80-86% accuracy and 5-6 seconds inference time per query.
This paper investigates whether Large Language Models (LLMs), fine-tuned on synthetic but domain-representative data, can perform the twofold task of (i) slot and intent detection and (ii) natural language response generation for a smart home assistant, while running solely on resource-limited, CPU-only edge hardware. We fine-tune LLMs to produce both JSON action calls and text responses. Our experiments show that 16-bit and 8-bit quantized variants preserve high accuracy on slot and intent detection and maintain strong semantic coherence in generated text, while the 4-bit model, while retaining generative fluency, suffers a noticeable drop in device-service classification accuracy. Further evaluations on noisy human (non-synthetic) prompts and out-of-domain intents confirm the models' generalization ability, obtaining around 80--86\% accuracy. While the average inference time is 5--6 seconds per query -- acceptable for one-shot commands but suboptimal for multi-turn dialogue -- our results affirm that an on-device LLM can effectively unify command interpretation and flexible response generation for home automation without relying on specialized hardware.