AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks
It provides an open-source solution for voice-driven tasks, enabling complex interactions with tools, though it is incremental as it builds on existing technologies.
The paper tackles the lack of open-source systems for speech-to-speech, multi-turn dialogue with tool use by introducing AURA, which achieves 92.75% on OpenBookQA and 90% task success in human evaluations.
Despite advances in language and speech technologies, no open-source system enables full speech-to-speech, multi-turn dialogue with integrated tool use and agentic reasoning. We introduce AURA (Agent for Understanding, Reasoning, and Automated Tool Use), the first open-source, speech-native assistant capable of completing complex, goal-driven tasks through dynamic tool invocation and multi-turn conversation. AURA combines open-weight ASR, TTS, and LLMs in a cascaded pipeline and supports tools such as calendar booking, contact lookup, web search, and email. Its modular design allows easy integration of new tools using natural language prompts and action classes. On VoiceBench, AURA scores 92.75% on OpenBookQA-outperforming all open-weight systems and nearing GPT-4o-and 4.39 on AlpacaEval, competitive with other open-weight systems. Human evaluation shows 90% task success on complex, multi-turn speech tasks.