CLDec 30, 2022
Black-box language model explanation by context length probingOndřej Cífka, Antoine Liutkus
The increasingly widespread adoption of large language models has highlighted the need for improving their explainability. We present context length probing, a novel explanation technique for causal language models, based on tracking the predictions of a model as a function of the length of available context, and allowing to assign differential importance scores to different contexts. The technique is model-agnostic and does not rely on access to model internals beyond computing token-level probabilities. We apply context length probing to large pre-trained language models and offer some initial analyses and insights, including the potential for studying long-range dependencies. The source code and an interactive demo of the method are available.
ASApr 17, 2018Code
The 2018 Signal Separation Evaluation CampaignFabian-Robert Stöter, Antoine Liutkus, Nobutaka Ito
This paper reports the organization and results for the 2018 community-based Signal Separation Evaluation Campaign (SiSEC 2018). This year's edition was focused on audio and pursued the effort towards scaling up and making it easier to prototype audio separation software in an era of machine-learning based systems. For this purpose, we prepared a new music separation database: MUSDB18, featuring close to 10h of audio. Additionally, open-source software was released to automatically load, process and report performance on MUSDB18. Furthermore, a new official Python version for the BSSEval toolbox was released, along with reference implementations for three oracle separation methods: ideal binary mask, ideal ratio mask, and multichannel Wiener filter. We finally report the results obtained by the participants.
LGMay 18, 2021
Relative Positional Encoding for Transformers with Linear ComplexityAntoine Liutkus, Ondřej Cífka, Shih-Lun Wu et al.
Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.
AIMay 10, 2019
AI in the media and creative industriesGiuseppe Amato, Malte Behrmann, Frédéric Bimbot et al.
Thanks to the Big Data revolution and increasing computing capacities, Artificial Intelligence (AI) has made an impressive revival over the past few years and is now omnipresent in both research and industry. The creative sectors have always been early adopters of AI technologies and this continues to be the case. As a matter of fact, recent technological developments keep pushing the boundaries of intelligent systems in creative applications: the critically acclaimed movie "Sunspring", released in 2016, was entirely written by AI technology, and the first-ever Music Album, called "Hello World", produced using AI has been released this year. Simultaneously, the exploratory nature of the creative process is raising important technical challenges for AI such as the ability for AI-powered techniques to be accurate under limited data resources, as opposed to the conventional "Big Data" approach, or the ability to process, analyse and match data from multiple modalities (text, sound, images, etc.) at the same time. The purpose of this white paper is to understand future technological advances in AI and their growing impact on creative industries. This paper addresses the following questions: Where does AI operate in creative Industries? What is its operative role? How will AI transform creative industries in the next ten years? This white paper aims to provide a realistic perspective of the scope of AI actions in creative industries, proposes a vision of how this technology could contribute to research and development works in such context, and identifies research and development challenges.
SDFeb 8, 2019
Speech enhancement with variational autoencoders and alpha-stable distributionsSimon Leglaive, Umut Simsekli, Antoine Liutkus et al.
This paper focuses on single-channel semi-supervised speech enhancement. We learn a speaker-independent deep generative speech model using the framework of variational autoencoders. The noise model remains unsupervised because we do not assume prior knowledge of the noisy recording environment. In this context, our contribution is to propose a noise model based on alpha-stable distributions, instead of the more conventional Gaussian non-negative matrix factorization approach found in previous studies. We develop a Monte Carlo expectation-maximization algorithm for estimating the model parameters at test time. Experimental results show the superiority of the proposed approach both in terms of perceptual quality and intelligibility of the enhanced speech signal.
MLJun 21, 2018
Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and DiffusionsAntoine Liutkus, Umut Şimşekli, Szymon Majewski et al.
By building upon the recent theory that established the connection between implicit generative modeling (IGM) and optimal transport, in this study, we propose a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is close to the data distribution as much as possible and also expressive enough for generative modeling purposes. We formulate the problem as a gradient flow in the space of probability measures. The connections between gradient flows and stochastic differential equations let us develop a computationally efficient algorithm for solving the optimization problem. We provide formal theoretical analysis where we prove finite-time error guarantees for the proposed algorithm. To the best of our knowledge, the proposed algorithm is the first nonparametric IGM algorithm with explicit theoretical guarantees. Our experimental results support our theory and show that our algorithm is able to successfully capture the structure of different types of data distributions.
SDApr 23, 2018
An Overview of Lead and Accompaniment Separation in MusicZafar Rafii, Antoine Liutkus, Fabian-Robert Stöter et al.
Popular music is often composed of an accompaniment and a lead component, the latter typically consisting of vocals. Filtering such mixtures to extract one or both components has many applications, such as automatic karaoke and remixing. This particular case of source separation yields very specific challenges and opportunities, including the particular complexity of musical structures, but also relevant prior knowledge coming from acoustics, musicology or sound engineering. Due to both its importance in applications and its challenging difficulty, lead and accompaniment separation has been a popular topic in signal processing for decades. In this article, we provide a comprehensive review of this research topic, organizing the different approaches according to whether they are model-based or data-centered. For model-based methods, we organize them according to whether they concentrate on the lead signal, the accompaniment, or both. For data-centered approaches, we discuss the particular difficulty of obtaining data for learning lead separation systems, and then review recent approaches, notably those based on deep learning. Finally, we discuss the delicate problem of evaluating the quality of music separation through adequate metrics and present the results of the largest evaluation, to-date, of lead and accompaniment separation systems. In conjunction with the above, a comprehensive list of references is provided, along with relevant pointers to available implementations and repositories.
MLNov 13, 2017
Blind Source Separation Using Mixtures of Alpha-Stable DistributionsNicolas Keriven, Antoine Deleforge, Antoine Liutkus
We propose a new blind source separation algorithm based on mixtures of alpha-stable distributions. Complex symmetric alpha-stable distributions have been recently showed to better model audio signals in the time-frequency domain than classical Gaussian distributions thanks to their larger dynamic range. However, inference of these models is notoriously hard to perform because their probability density functions do not have a closed-form expression in general. Here, we introduce a novel method for estimating mixture of alpha-stable distributions based on characteristic function matching. We apply this to the blind estimation of binary masks in individual frequency bands from multichannel convolutive audio mixes. We show that the proposed method yields better separation performance than Gaussian-based binary-masking methods.
SDAug 5, 2016
Lévy NMF for robust nonnegative source separationPaul Magron, Roland Badeau, Antoine Liutkus
Source separation, which consists in decomposing data into meaningful structured components, is an active research topic in many areas, such as music and image signal processing, applied physics and text mining. In this paper, we introduce the Positive $α$-stable (P$α$S) distributions to model the latent sources, which are a subclass of the stable distributions family. They notably permit us to model random variables that are both nonnegative and impulsive. Considering the Lévy distribution, the only P$α$S distribution whose density is tractable, we propose a mixture model called Lévy Nonnegative Matrix Factorization (Lévy NMF). This model accounts for low-rank structures in nonnegative data that possibly has high variability or is corrupted by very adverse noise. The model parameters are estimated in a maximum-likelihood sense. We also derive an estimator of the sources given the parameters, which extends the validity of the generalized Wiener filtering to the P$α$S case. Experiments on synthetic data show that Lévy NMF compares favorably with state-of-the art techniques in terms of robustness to impulsive noise. The analysis of two types of realistic signals is also considered: musical spectrograms and fluorescence spectra of chemical species. The results highlight the potential of the Lévy NMF model for decomposing nonnegative data.
SDJan 19, 2015
Listening to featuresManuel Moussallam, Antoine Liutkus, Laurent Daudet
This work explores nonparametric methods which aim at synthesizing audio from low-dimensionnal acoustic features typically used in MIR frameworks. Several issues prevent this task to be straightforwardly achieved. Such features are designed for analysis and not for synthesis, thus favoring high-level description over easily inverted acoustic representation. Whereas some previous studies already considered the problem of synthesizing audio from features such as Mel-Frequency Cepstral Coefficients, they mainly relied on the explicit formula used to compute those features in order to inverse them. Here, we instead adopt a simple blind approach, where arbitrary sets of features can be used during synthesis and where reconstruction is exemplar-based. After testing the approach on a speech synthesis from well known features problem, we apply it to the more complex task of inverting songs from the Million Song Dataset. What makes this task harder is twofold. First, that features are irregularly spaced in the temporal domain according to an onset-based segmentation. Second the exact method used to compute these features is unknown, although the features for new audio can be computed using their API as a black-box. In this paper, we detail these difficulties and present a framework to nonetheless attempting such synthesis by concatenating audio samples from a training dataset, whose features have been computed beforehand. Samples are selected at the segment level, in the feature space with a simple nearest neighbor search. Additionnal constraints can then be defined to enhance the synthesis pertinence. Preliminary experiments are presented using RWC and GTZAN audio datasets to synthesize tracks from the Million Song Dataset.