Lionel Yelibi

CP
4papers
5citations
Novelty46%
AI Score35

4 Papers

12.0MLMar 10
a-TMFG: Scalable Triangulated Maximally Filtered Graphs via Approximate Nearest Neighbors

Lionel Yelibi

The traditional Triangular Maximally Filtered Graph (TMFG) construction requires pre-computation and storage of a dense correlation matrix; this limits its applicability to small and medium-sized datasets. Here we identify key memory and runtime complexity challenges when using TMFG at scale. We then present the Approximate Triangular Maximally Filtered Graph (a-TMFG) algorithm. This is a novel approach to scaling the construction of artificial graphs from data inspired by TMFG. The method employs k-Nearest Neighbors Graphs (kNNG) for initial construction, and implements a memory management strategy to search and estimate missing correlations on-the-fly. This provides representations to control combinatorial explosion. The algorithm is tested for robustness to the parameters and noise, and is evaluated on datasets with millions of observations. This new method provides a parsimonious way to construct graphs for use-cases where graphs are used as input to supervised and unsupervised learning but where no natural graph exists.

LGJun 4, 2024
MS-IMAP -- A Multi-Scale Graph Embedding Approach for Interpretable Manifold Learning

Shay Deutsch, Lionel Yelibi, Alex Tong Lin et al.

Deriving meaningful representations from complex, high-dimensional data in unsupervised settings is crucial across diverse machine learning applications. This paper introduces a framework for multi-scale graph network embedding based on spectral graph wavelets that employs a contrastive learning approach. We theoretically show that in Paley-Wiener spaces on combinatorial graphs, the spectral graph wavelets operator provides greater flexibility and control over smoothness compared to the Laplacian operator, motivating our approach. A key advantage of the proposed embedding is its ability to establish a correspondence between the embedding and input feature spaces, enabling the derivation of feature importance. We validate the effectiveness of our graph embedding framework on multiple public datasets across various downstream tasks, including clustering and unsupervised feature importance.

CPAug 2, 2019
Agglomerative Likelihood Clustering

Lionel Yelibi, Tim Gebbie

We consider the problem of fast time-series data clustering. Building on previous work modeling the correlation-based Hamiltonian of spin variables we present an updated fast non-expensive Agglomerative Likelihood Clustering algorithm (ALC). The method replaces the optimized genetic algorithm based approach (f-SPC) with an agglomerative recursive merging framework inspired by previous work in Econophysics and Community Detection. The method is tested on noisy synthetic correlated time-series data-sets with built-in cluster structure to demonstrate that the algorithm produces meaningful non-trivial results. We apply it to time-series data-sets as large as 20,000 assets and we argue that ALC can reduce compute time costs and resource usage cost for large scale clustering for time-series applications while being serialized, and hence has no obvious parallelization requirement. The algorithm can be an effective choice for state-detection for online learning in a fast non-linear data environment because the algorithm requires no prior information about the number of clusters.

CPOct 5, 2018
Fast Super-Paramagnetic Clustering

Lionel Yelibi, Tim Gebbie

We map stock market interactions to spin models to recover their hierarchical structure using a simulated annealing based Super-Paramagnetic Clustering (SPC) algorithm. This is directly compared to a modified implementation of a maximum likelihood approach we call Fast Super-Paramagnetic Clustering (f-SPC). The methods are first applied standard toy test-case problems, and then to a data-set of 447 stocks traded on the New York Stock Exchange (NYSE) over 1249 days. The signal to noise ratio of stock market correlation matrices is briefly considered. Our result recover approximately clusters representative of standard economic sectors and mixed ones whose dynamics shine light on the adaptive nature of financial markets and raise concerns relating to the effectiveness of industry based static financial market classification in the world of real-time data analytics. A key result is that we show that f-SPC maximum likelihood solutions converge to ones found within the Super-Paramagnetic Phase where the entropy is maximum, and those solutions are qualitatively better for high dimensionality data-sets.