a-TMFG: Scalable Triangulated Maximally Filtered Graphs via Approximate Nearest Neighbors
This provides a scalable method for constructing graphs in machine learning applications where no natural graph exists, though it is incremental as it builds on TMFG with approximations.
The paper tackles the scalability limitations of Triangular Maximally Filtered Graph (TMFG) construction, which traditionally requires dense correlation matrices, by introducing the a-TMFG algorithm that uses k-Nearest Neighbors Graphs and on-the-fly correlation estimation, enabling application to datasets with millions of observations.
The traditional Triangular Maximally Filtered Graph (TMFG) construction requires pre-computation and storage of a dense correlation matrix; this limits its applicability to small and medium-sized datasets. Here we identify key memory and runtime complexity challenges when using TMFG at scale. We then present the Approximate Triangular Maximally Filtered Graph (a-TMFG) algorithm. This is a novel approach to scaling the construction of artificial graphs from data inspired by TMFG. The method employs k-Nearest Neighbors Graphs (kNNG) for initial construction, and implements a memory management strategy to search and estimate missing correlations on-the-fly. This provides representations to control combinatorial explosion. The algorithm is tested for robustness to the parameters and noise, and is evaluated on datasets with millions of observations. This new method provides a parsimonious way to construct graphs for use-cases where graphs are used as input to supervised and unsupervised learning but where no natural graph exists.