MMCLLGFeb 17

Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU

arXiv:2602.15707v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses privacy and computational efficiency for users performing manual tasks, though it is incremental as it builds on existing language models with a novel finetuning method.

The authors tackled the problem of real-time conversational assistants for procedural tasks by proposing a system that uses only audio and IMU inputs to guide furniture assembly, achieving a >30% F-score improvement and 16x speedup.

Real-time conversational assistants for procedural tasks often depend on video input, which can be computationally expensive and compromise user privacy. For the first time, we propose a real-time conversational assistant that provides comprehensive guidance for a procedural task using only lightweight privacy-preserving modalities such as audio and IMU inputs from a user's wearable device to understand the context. This assistant proactively communicates step-by-step instructions to a user performing a furniture assembly task, and answers user questions. We construct a dataset containing conversations where the assistant guides the user in performing the task. On observing that an off-the-shelf language model is a very talkative assistant, we design a novel User Whim Agnostic (UWA) LoRA finetuning method which improves the model's ability to suppress less informative dialogues, while maintaining its tendency to communicate important instructions. This leads to >30% improvement in the F-score. Finetuning the model also results in a 16x speedup by eliminating the need to provide in-context examples in the prompt. We further describe how such an assistant is implemented on edge devices with no dependence on the cloud.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes