Bach or Mock? A Grading Function for Chorales in the Style of J.S. Bach
This addresses the need for automatic and interpretable evaluation in music generation, benefiting researchers and practitioners by reducing reliance on costly expert assessments, though it is incremental as it builds on existing evaluation challenges in generative systems.
The paper tackles the problem of automatically evaluating the stylistic correctness of generated music, specifically four-part chorales in the style of J.S. Bach, by introducing a grading function that assesses key musical features. The result shows that this function outperforms human experts in distinguishing Bach chorales from model-generated ones, providing an interpretable and effective evaluation measure.
Deep generative systems that learn probabilistic models from a corpus of existing music do not explicitly encode knowledge of a musical style, compared to traditional rule-based systems. Thus, it can be difficult to determine whether deep models generate stylistically correct output without expert evaluation, but this is expensive and time-consuming. Therefore, there is a need for automatic, interpretable, and musically-motivated evaluation measures of generated music. In this paper, we introduce a grading function that evaluates four-part chorales in the style of J.S. Bach along important musical features. We use the grading function to evaluate the output of a Transformer model, and show that the function is both interpretable and outperforms human experts at discriminating Bach chorales from model-generated ones.