HCAIApr 25, 2024

Leveraging AI to Generate Audio for User-generated Content in Video Games

arXiv:2404.17018v14 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This addresses a practical issue for video game developers and players by enabling dynamic audio creation, though it appears incremental as it applies existing AI techniques to a new domain.

The paper tackles the problem of generating audio for user-generated content in video games, where pre-created assets are impractical, by exploring generative AI methods like text-to-audio and image-to-audio, and presents prototype games as proof-of-concept.

In video game design, audio (both environmental background music and object sound effects) play a critical role. Sounds are typically pre-created assets designed for specific locations or objects in a game. However, user-generated content is becoming increasingly popular in modern games (e.g. building custom environments or crafting unique objects). Since the possibilities are virtually limitless, it is impossible for game creators to pre-create audio for user-generated content. We explore the use of generative artificial intelligence to create music and sound effects on-the-fly based on user-generated content. We investigate two avenues for audio generation: 1) text-to-audio: using a text description of user-generated content as input to the audio generator, and 2) image-to-audio: using a rendering of the created environment or object as input to an image-to-text generator, then piping the resulting text description into the audio generator. In this paper we discuss ethical implications of using generative artificial intelligence for user-generated content and highlight two prototype games where audio is generated for user-created environments and objects.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes