CVJan 29, 2024Code
Find the Cliffhanger: Multi-Modal Trailerness in Soap OperasCarlo Bretti, Pascal Mettes, Hendrik Vincent Koops et al.
Creating a trailer requires carefully picking out and piecing together brief enticing moments out of a longer video, making it a challenging and time-consuming task. This requires selecting moments based on both visual and dialogue information. We introduce a multi-modal method for predicting the trailerness to assist editors in selecting trailer-worthy moments from long-form videos. We present results on a newly introduced soap opera dataset, demonstrating that predicting trailerness is a challenging task that benefits from multi-modal information. Code is available at https://github.com/carlobretti/cliffhanger
SDOct 17, 2023
Serenade: A Model for Human-in-the-loop Automatic Chord EstimationHendrik Vincent Koops, Gianluca Micchi, Ilaria Manco et al.
Computational harmony analysis is important for MIR tasks such as automatic segmentation, corpus analysis and automatic chord label estimation. However, recent research into the ambiguous nature of musical harmony, causing limited inter-rater agreement, has made apparent that there is a glass ceiling for common metrics such as accuracy. Commonly, these issues are addressed either in the training data itself by creating majority-rule annotations or during the training phase by learning soft targets. We propose a novel alternative approach in which a human and an autoregressive model together co-create a harmonic annotation for an audio track. After automatically generating harmony predictions, a human sparsely annotates parts with low model confidence and the model then adjusts its predictions following human guidance. We evaluate our model on a dataset of popular music and we show that, with this human-in-the-loop approach, harmonic analysis performance improves over a model-only approach. The human contribution is amplified by the second, constrained prediction of the model.
SDOct 12, 2020
AI Song Contest: Human-AI Co-Creation in SongwritingCheng-Zhi Anna Huang, Hendrik Vincent Koops, Ed Newton-Rex et al.
Machine learning is challenging the way we make music. Although research in deep generative models has dramatically improved the capability and fluency of music models, recent work has shown that it can be challenging for humans to partner with this new class of algorithms. In this paper, we present findings on what 13 musician/developer teams, a total of 61 users, needed when co-creating a song with AI, the challenges they faced, and how they leveraged and repurposed existing characteristics of AI to overcome some of these challenges. Many teams adopted modular approaches, such as independently running multiple smaller models that align with the musical building blocks of a song, before re-combining their results. As ML models are not easily steerable, teams also generated massive numbers of samples and curated them post-hoc, or used a range of strategies to direct the generation, or algorithmically ranked the samples. Ultimately, teams not only had to manage the "flare and focus" aspects of the creative process, but also juggle them with a parallel process of exploring and curating multiple ML models and outputs. These findings reflect a need to design machine learning-powered music interfaces that are more decomposable, steerable, interpretable, and adaptive, which in return will enable artists to more effectively explore how AI can extend their personal expression.
SDFeb 22, 2020
DECIBEL: Improving Audio Chord Estimation for Popular Music by Alignment and Integration of Crowd-Sourced Symbolic RepresentationsDaphne Odekerken, Hendrik Vincent Koops, Anja Volk
Automatic Chord Estimation (ACE) is a fundamental task in Music Information Retrieval (MIR) and has applications in both music performance and MIR research. The task consists of segmenting a music recording or score and assigning a chord label to each segment. Although it has been a task in the annual benchmarking evaluation MIREX for over 10 years, ACE is not yet a solved problem, since performance has stagnated and modern systems have started to tune themselves to subjective training data. We propose DECIBEL, a new ACE system that exploits widely available MIDI and tab representations to improve ACE from audio only. From an audio file and a set of MIDI and tab files corresponding to the same popular music song, DECIBEL first estimates chord sequences. For audio, state-of-the-art audio ACE methods are used. MIDI files are aligned to the audio, followed by a MIDI chord estimation step. Tab files are transformed into untimed chord sequences and then aligned to the audio. Next, DECIBEL uses data fusion to integrate all estimated chord sequences into one final output sequence. DECIBEL improves all tested state-of-the-art ACE methods by over 3 percent on average. This result shows that the integration of musical knowledge from heterogeneous symbolic music representations is a suitable strategy for addressing challenging MIR tasks such as ACE.