SDAIMMASMar 4, 2025

A Multimodal Symphony: Integrating Taste and Sound through Generative AI

arXiv:2503.02823v12 citationsh-index: 1Has CodeFrontiers of Computer Science
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of creating embodied interactions between AI, sound, and taste, representing an incremental step in multimodal generative AI.

The paper tackled the problem of converting taste information into music using generative AI, with results showing that a fine-tuned MusicGEN model produced music more coherently reflecting taste descriptions than a non-fine-tuned model, as evaluated by 111 participants.

In recent decades, neuroscientific and psychological research has traced direct relationships between taste and auditory perceptions. This article explores multimodal generative models capable of converting taste information into music, building on this foundational research. We provide a brief review of the state of the art in this field, highlighting key findings and methodologies. We present an experiment in which a fine-tuned version of a generative music model (MusicGEN) is used to generate music based on detailed taste descriptions provided for each musical piece. The results are promising: according the participants' ($n=111$) evaluation, the fine-tuned model produces music that more coherently reflects the input taste descriptions compared to the non-fine-tuned model. This study represents a significant step towards understanding and developing embodied interactions between AI, sound, and taste, opening new possibilities in the field of generative AI. We release our dataset, code and pre-trained model at: https://osf.io/xs5jy/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes