SDAILGOct 16, 2025

Beat Tracking as Object Detection

arXiv:2510.14391v21 citations
Originality Incremental advance
AI Analysis

This work addresses beat tracking for music analysis, presenting an incremental adaptation of computer vision methods to audio tasks.

The paper tackled beat and downbeat tracking by reframing it as an object detection problem, modeling beats as temporal objects and adapting the FCOS detector to audio, achieving competitive results on standard music datasets.

Recent beat and downbeat tracking models (e.g., RNNs, TCNs, Transformers) output frame-level activations. We propose reframing this task as object detection, where beats and downbeats are modeled as temporal "objects." Adapting the FCOS detector from computer vision to 1D audio, we replace its original backbone with WaveBeat's temporal feature extractor and add a Feature Pyramid Network to capture multi-scale temporal patterns. The model predicts overlapping beat/downbeat intervals with confidence scores, followed by non-maximum suppression (NMS) to select final predictions. This NMS step serves a similar role to DBNs in traditional trackers, but is simpler and less heuristic. Evaluated on standard music datasets, our approach achieves competitive results, showing that object detection techniques can effectively model musical beats with minimal adaptation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes