A Bi-directional Transformer for Musical Chord Recognition
This work addresses chord recognition for music analysis, but it is incremental as it applies an existing Transformer architecture to a specific domain task.
The paper tackled chord recognition in music by proposing a bi-directional Transformer model (BTC) that uses self-attention to capture long-term dependencies, achieving competitive performance with a single-phase training approach.
Chord recognition is an important task since chords are highly abstract and descriptive features of music. For effective chord recognition, it is essential to utilize relevant context in audio sequence. While various machine learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been employed for the task, most of them have limitations in capturing long-term dependency or require training of an additional model. In this work, we utilize a self-attention mechanism for chord recognition to focus on certain regions of chords. Training of the proposed bi-directional Transformer for chord recognition (BTC) consists of a single phase while showing competitive performance. Through an attention map analysis, we have visualized how attention was performed. It turns out that the model was able to divide segments of chords by utilizing adaptive receptive field of the attention mechanism. Furthermore, it was observed that the model was able to effectively capture long-term dependencies, making use of essential information regardless of distance.