SDDec 6, 2016Code
FMA: A Dataset For Music AnalysisMichaël Defferrard, Kirell Benzi, Pierre Vandergheynst et al.
We introduce the Free Music Archive (FMA), an open and easily accessible dataset suitable for evaluating several tasks in MIR, a field concerned with browsing, searching, and organizing large music collections. The community's growing interest in feature and end-to-end learning is however restrained by the limited availability of large audio datasets. The FMA aims to overcome this hurdle by providing 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres. It provides full-length and high-quality audio, pre-computed features, together with track- and user-level metadata, tags, and free-form text such as biographies. We here describe the dataset and how it was created, propose a train/validation/test split and three subsets, discuss some suitable MIR tasks, and evaluate some baselines for genre recognition. Code, data, and usage examples are available at https://github.com/mdeff/fma
SIJan 22, 2019
Anomaly detection in the dynamics of web and social networksVolodymyr Miz, Benjamin Ricaud, Kirell Benzi et al.
In this work, we propose a new, fast and scalable method for anomaly detection in large time-evolving graphs. It may be a static graph with dynamic node attributes (e.g. time-series), or a graph evolving in time, such as a temporal network. We define an anomaly as a localized increase in temporal activity in a cluster of nodes. The algorithm is unsupervised. It is able to detect and track anomalous activity in a dynamic network despite the noise from multiple interfering sources. We use the Hopfield network model of memory to combine the graph and time information. We show that anomalies can be spotted with a good precision using a memory network. The presented approach is scalable and we provide a distributed implementation of the algorithm. To demonstrate its efficiency, we apply it to two datasets: Enron Email dataset and Wikipedia page views. We show that the anomalous spikes are triggered by the real-world events that impact the network dynamics. Besides, the structure of the clusters and the analysis of the time evolution associated with the detected events reveals interesting facts on how humans interact, exchange and search for information, opening the door to new quantitative studies on collective and social behavior on large and dynamic datasets.
IROct 1, 2017
Wikipedia graph mining: dynamic structure of collective memoryVolodymyr Miz, Kirell Benzi, Benjamin Ricaud et al.
Wikipedia is the biggest encyclopedia ever created and the fifth most visited website in the world. Tens of millions of people surf it every day, seeking answers to various questions. Collective user activity on its pages leaves publicly available footprints of human behavior, making Wikipedia an excellent source for analysis of collective behavior. In this work, we propose a distributed graph-based event extraction model, inspired by the Hebbian learning theory. The model exploits collective effect of the dynamics to discover events. We focus on data-streams with underlying graph structure and perform several large-scale experiments on the Wikipedia visitor activity data. We show that the presented model is scalable regarding time-series length and graph density, providing a distributed implementation of the proposed algorithm. We extract dynamical patterns of collective activity and demonstrate that they correspond to meaningful clusters of associated events, reflected in the Wikipedia articles. We also illustrate evolutionary dynamics of the graphs over time to highlight changing nature of visitors' interests. Finally, we discuss clusters of events that model collective recall process and represent collective memories - common memories shared by a group of people.
MLJan 8, 2016
Song Recommendation with Non-Negative Matrix Factorization and Graph Total VariationKirell Benzi, Vassilis Kalofolias, Xavier Bresson et al.
This work formulates a novel song recommender system as a matrix completion problem that benefits from collaborative filtering through Non-negative Matrix Factorization (NMF) and content-based filtering via total variation (TV) on graphs. The graphs encode both playlist proximity information and song similarity, using a rich combination of audio, meta-data and social features. As we demonstrate, our hybrid recommendation system is very versatile and incorporates several well-known methods while outperforming them. Particularly, we show on real-world data that our model overcomes w.r.t. two evaluation metrics the recommendation of models solely based on low-rank information, graph-based information or a combination of both.