Harmonic Reasoning in Large Language Models
This work identifies specific limitations in LLMs for musical reasoning, which could inform improvements for artistic and complex applications, though it is incremental in scope.
The paper investigated how well large language models (LLMs) like GPT-3.5 and GPT-4o handle musical reasoning tasks such as identifying notes from intervals, chords, and scales, finding that while they perform well on intervals, they struggle with more complex tasks like chord and scale recognition.
Large Language Models (LLMs) are becoming very popular and are used for many different purposes, including creative tasks in the arts. However, these models sometimes have trouble with specific reasoning tasks, especially those that involve logical thinking and counting. This paper looks at how well LLMs understand and reason when dealing with musical tasks like figuring out notes from intervals and identifying chords and scales. We tested GPT-3.5 and GPT-4o to see how they handle these tasks. Our results show that while LLMs do well with note intervals, they struggle with more complicated tasks like recognizing chords and scales. This points out clear limits in current LLM abilities and shows where we need to make them better, which could help improve how they think and work in both artistic and other complex areas. We also provide an automatically generated benchmark data set for the described tasks.