SD AI ASOct 18, 2025

MuseTok: Symbolic Music Tokenization for Generation and Semantic Understanding

Jingyue Huang, Zachary Novack, Phillip Long, Yupeng Hou, Ke Chen, Taylor Berg-Kirkpatrick, Julian McAuley

arXiv:2510.16273v14.0h-index: 20

Originality Incremental advance

AI Analysis

This work addresses the need for effective symbolic music tokenization for researchers and practitioners in music AI, offering incremental improvements over existing methods.

The authors tackled the problem of discrete representation learning for symbolic music by proposing MuseTok, a tokenization method that employs RQ-VAE and Transformer frameworks, resulting in outperforming baselines in semantic understanding tasks like melody extraction and chord recognition while maintaining comparable performance in generation.

Discrete representation learning has shown promising results across various domains, including generation and understanding in image, speech and language. Inspired by these advances, we propose MuseTok, a tokenization method for symbolic music, and investigate its effectiveness in both music generation and understanding tasks. MuseTok employs the residual vector quantized-variational autoencoder (RQ-VAE) on bar-wise music segments within a Transformer-based encoder-decoder framework, producing music codes that achieve high-fidelity music reconstruction and accurate understanding of music theory. For comprehensive evaluation, we apply MuseTok to music generation and semantic understanding tasks, including melody extraction, chord recognition, and emotion recognition. Models incorporating MuseTok outperform previous representation learning baselines in semantic understanding while maintaining comparable performance in content generation. Furthermore, qualitative analyses on MuseTok codes, using ground-truth categories and synthetic datasets, reveal that MuseTok effectively captures underlying musical concepts from large music collections.

View on arXiv PDF

Similar