SDIRLGASFeb 13, 2022

Learning long-term music representations via hierarchical contextual constraints

arXiv:2202.06180v111 citations
Originality Incremental advance
AI Analysis

This addresses a domain-specific problem for music AI by providing a more stable and effective approach to long-term representation learning, though it is incremental as it builds on existing contrastive and hierarchical techniques.

The paper tackles the challenge of learning long-term symbolic music representations by proposing a method using hierarchical contextual constraints, which stabilizes training and significantly outperforms baselines in reconstruction and disentanglement.

Learning symbolic music representations, especially disentangled representations with probabilistic interpretations, has been shown to benefit both music understanding and generation. However, most models are only applicable to short-term music, while learning long-term music representations remains a challenging task. We have seen several studies attempting to learn hierarchical representations directly in an end-to-end manner, but these models have not been able to achieve the desired results and the training process is not stable. In this paper, we propose a novel approach to learn long-term symbolic music representations through contextual constraints. First, we use contrastive learning to pre-train a long-term representation by constraining its difference from the short-term representation (extracted by an off-the-shelf model). Then, we fine-tune the long-term representation by a hierarchical prediction model such that a good long-term representation (e.g., an 8-bar representation) can reconstruct the corresponding short-term ones (e.g., the 2-bar representations within the 8-bar range). Experiments show that our method stabilizes the training and the fine-tuning steps. In addition, the designed contextual constraints benefit both reconstruction and disentanglement, significantly outperforming the baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes