HCAINov 7, 2024

CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR

arXiv:2411.04671v311 citationsh-index: 44Has Code2025 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR)
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more engaging and naturalistic user interfaces in XR applications, such as training and entertainment, by providing a customizable tool for developers, though it is incremental as it builds on existing LLM and speech technologies.

The paper tackles the challenge of enabling natural conversational interactions in extended reality (XR) by introducing CUIfy, an open-source Unity package that integrates large language models (LLMs) with speech-to-text and text-to-speech models to create intelligent non-player characters, achieving low-latency streaming for usable interactions.

Recent developments in computer graphics, machine learning, and sensor technologies enable numerous opportunities for extended reality (XR) setups for everyday life, from skills training to entertainment. With large corporations offering affordable consumer-grade head-mounted displays (HMDs), XR will likely become pervasive, and HMDs will develop as personal devices like smartphones and tablets. However, having intelligent spaces and naturalistic interactions in XR is as important as technological advances so that users grow their engagement in virtual and augmented spaces. To this end, large language model (LLM)--powered non-player characters (NPCs) with speech-to-text (STT) and text-to-speech (TTS) models bring significant advantages over conventional or pre-scripted NPCs for facilitating more natural conversational user interfaces (CUIs) in XR. This paper provides the community with an open-source, customizable, extendable, and privacy-aware Unity package, CUIfy, that facilitates speech-based NPC-user interaction with widely used LLMs, STT, and TTS models. Our package also supports multiple LLM-powered NPCs per environment and minimizes latency between different computational models through streaming to achieve usable interactions between users and NPCs. We publish our source code in the following repository: https://gitlab.lrz.de/hctl/cuify

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes