Metric Learning for Temporal Sequence Alignment
This work addresses alignment challenges in time series analysis, particularly for audio applications, but is incremental as it builds on existing metric learning and structured prediction frameworks.
The paper tackles the problem of aligning multivariate time series by learning a Mahalanobis distance, resulting in improved performance for audio-to-audio alignment tasks with demonstrated gains.
In this paper, we propose to learn a Mahalanobis distance to perform alignment of multivariate time series. The learning examples for this task are time series for which the true alignment is known. We cast the alignment problem as a structured prediction task, and propose realistic losses between alignments for which the optimization is tractable. We provide experiments on real data in the audio to audio context, where we show that the learning of a similarity measure leads to improvements in the performance of the alignment task. We also propose to use this metric learning framework to perform feature selection and, from basic audio features, build a combination of these with better performance for the alignment.