MLNov 3, 2023
Online non-parametric likelihood-ratio estimation by Pearson-divergence functional minimizationAlejandro de la Concha, Nicolas Vayatis, Argyris Kalogeratos
Quantifying the difference between two probability density functions, $p$ and $q$, using available data, is a fundamental problem in Statistics and Machine Learning. A usual approach for addressing this problem is the likelihood-ratio estimation (LRE) between $p$ and $q$, which -- to our best knowledge -- has been investigated mainly for the offline case. This paper contributes by introducing a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t \sim p, x'_t \sim q)$ are observed over time. The non-parametric nature of our approach has the advantage of being agnostic to the forms of $p$ and $q$. Moreover, we capitalize on the recent advances in Kernel Methods and functional minimization to develop an estimator that can be efficiently updated online. We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
MLMay 28, 2022
Collaborative likelihood-ratio estimation over graphsAlejandro de la Concha, Nicolas Vayatis, Argyris Kalogeratos
Assuming we have iid observations from two unknown probability density functions (pdfs), $p$ and $q$, the likelihood-ratio estimation (LRE) is an elegant approach to compare the two pdfs only by relying on the available data. In this paper, we introduce the first -to the best of our knowledge-graph-based extension of this problem, which reads as follows: Suppose each node $v$ of a fixed graph has access to observations coming from two unknown node-specific pdfs, $p_v$ and $q_v$, and the goal is to estimate for each node the likelihood-ratio between both pdfs by also taking into account the information provided by the graph structure. The node-level estimation tasks are supposed to exhibit similarities conveyed by the graph, which suggests that the nodes could collaborate to solve them more efficiently. We develop this idea in a concrete non-parametric method that we call Graph-based Relative Unconstrained Least-squares Importance Fitting (GRULSIF). We derive convergence rates for our collaborative approach that highlights the role played by variables such as the number of available observations per node, the size of the graph, and how accurately the graph structure encodes the similarity between tasks. These theoretical results explicit the situations where collaborative estimation effectively leads to an improvement in performance compared to solving each problem independently. Finally, in a series of experiments, we illustrate how GRULSIF infers the likelihood-ratios at the nodes of the graph more accurately compared to state-of-the art LRE methods, which would operate independently at each node, and we also verify that the behavior of GRULSIF is aligned with our previous theoretical analysis.
MLJan 8, 2023
Online Centralized Non-parametric Change-point Detection via Graph-based Likelihood-ratio EstimationAlejandro de la Concha, Argyris Kalogeratos, Nicolas Vayatis
Consider each node of a graph to be generating a data stream that is synchronized and observed at near real-time. At a change-point $τ$, a change occurs at a subset of nodes $C$, which affects the probability distribution of their associated node streams. In this paper, we propose a novel kernel-based method to both detect $τ$ and localize $C$, based on the direct estimation of the likelihood-ratio between the post-change and the pre-change distributions of the node streams. Our main working hypothesis is the smoothness of the likelihood-ratio estimates over the graph, i.e connected nodes are expected to have similar likelihood-ratios. The quality of the proposed method is demonstrated on extensive experiments on synthetic scenarios.
LGJul 7, 2021Code
ADAPT : Awesome Domain Adaptation Python ToolboxAntoine de Mathelin, Mounir Atiq, Guillaume Richard et al.
In this paper, we introduce the ADAPT library, an open source Python API providing the implementation of the main transfer learning and domain adaptation methods. The library is designed with a user friendly approach to facilitate the access to domain adaptation for a wide public. ADAPT is compatible with scikit-learn and TensorFlow and a full documentation is proposed online https://adapt-python.github.io/adapt/ with a substantial gallery of examples.
MLFeb 8, 2024
Collaborative non-parametric two-sample testingAlejandro de la Concha, Nicolas Vayatis, Argyris Kalogeratos
This paper addresses the multiple two-sample test problem in a graph-structured setting, which is a common scenario in fields such as Spatial Statistics and Neuroscience. Each node $v$ in fixed graph deals with a two-sample testing problem between two node-specific probability density functions (pdfs), $p_v$ and $q_v$. The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected, under the assumption that connected nodes would yield similar test outcomes. We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure and minimizes the assumptions over $p_v$ and $q_v$. Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning. We use synthetic experiments and a real sensor network detecting seismic activity to demonstrate that CTST outperforms state-of-the-art non-parametric statistical tests that apply at each node independently, hence disregard the geometry of the problem.
MLOct 20, 2021
Online non-parametric change-point detection for heterogeneous data streams observed over graph nodesAlejandro de la Concha, Argyris Kalogeratos, Nicolas Vayatis
Consider a heterogeneous data stream being generated by the nodes of a graph. The data stream is in essence composed by multiple streams, possibly of different nature that depends on each node. At a given moment $τ$, a change-point occurs for a subset of nodes $C$, signifying the change in the probability distribution of their associated streams. In this paper we propose an online non-parametric method to infer $τ$ based on the direct estimation of the likelihood-ratio between the post-change and the pre-change distribution associated with the data stream of each node. We propose a kernel-based method, under the hypothesis that connected nodes of the graph are expected to have similar likelihood-ratio estimates when there is no change-point. We demonstrate the quality of our method on synthetic experiments and real-world applications.
LGJun 18, 2020
Offline detection of change-points in the mean for stationary graph signalsAlejandro de la Concha, Nicolas Vayatis, Argyris Kalogeratos
This paper addresses the problem of segmenting a stream of graph signals: we aim to detect changes in the mean of a multivariate signal defined over the nodes of a known graph. We propose an offline method that relies on the concept of graph signal stationarity and allows the convenient translation of the problem from the original vertex domain to the spectral domain (Graph Fourier Transform), where it is much easier to solve. Although the obtained spectral representation is sparse in real applications, to the best of our knowledge this property has not been sufficiently exploited in the existing related literature. Our change-point detection method adopts a model selection approach that takes into account the sparsity of the spectral representation and determines automatically the number of change-points. Our detector comes with a proof of a non-asymptotic oracle inequality. Numerical experiments demonstrate the performance of the proposed method.