VoCopilot: Voice-Activated Tracking of Everyday Interactions
This work addresses the need for accessible vocal interaction tracking for users seeking insights from daily conversations, though it appears incremental as it builds on existing technologies like acoustic hardware and large language models.
The paper tackles the problem of tracking vocal interactions in everyday life by introducing VoCopilot, an end-to-end system that continuously records, transcribes, and extracts insights from conversations using energy-efficient hardware and machine learning models, with results demonstrated in real-world use cases.
Voice plays an important role in our lives by facilitating communication, conveying emotions, and indicating health. Therefore, tracking vocal interactions can provide valuable insight into many aspects of our lives. This paper presents our ongoing efforts to design a new vocal tracking system we call VoCopilot. VoCopilot is an end-to-end system centered around an energy-efficient acoustic hardware and firmware combined with advanced machine learning models. As a result, VoCopilot is able to continuously track conversations, record them, transcribe them, and then extract useful insights from them. By utilizing large language models, VoCopilot ensures the user can extract useful insights from recorded interactions without having to learn complex machine learning techniques. In order to protect the privacy of end users, VoCopilot uses a novel wake-up mechanism that only records conversations of end users. Additionally, all the rest of pipeline can be run on a commodity computer (Mac Mini M2). In this work, we show the effectiveness of VoCopilot in real-world environment for two use cases.