Early MFCC And HPCP Fusion for Robust Cover Song Identification
This work addresses the challenge of robustly identifying cover songs for music information retrieval, representing a strong specific gain rather than a foundational breakthrough.
The paper tackles the problem of cover song identification by fusing MFCC and HPCP features in an unsupervised algorithm, achieving a state-of-the-art mean reciprocal rank of 0.87 on the Covers80 dataset and 0.9 on a new Covers1000 dataset.
While most schemes for automatic cover song identification have focused on note-based features such as HPCP and chord profiles, a few recent papers surprisingly showed that local self-similarities of MFCC-based features also have classification power for this task. Since MFCC and HPCP capture complementary information, we design an unsupervised algorithm that combines normalized, beat-synchronous blocks of these features using cross-similarity fusion before attempting to locally align a pair of songs. As an added bonus, our scheme naturally incorporates structural information in each song to fill in alignment gaps where both feature sets fail. We show a striking jump in performance over MFCC and HPCP alone, achieving a state of the art mean reciprocal rank of 0.87 on the Covers80 dataset. We also introduce a new medium-sized hand designed benchmark dataset called "Covers 1000," which consists of 395 cliques of cover songs for a total of 1000 songs, and we show that our algorithm achieves an MRR of 0.9 on this dataset for the first correctly identified song in a clique. We provide the precomputed HPCP and MFCC features, as well as beat intervals, for all songs in the Covers 1000 dataset for use in further research.