Mahajabin Rahman

1.2DATA-ANMay 7, 2023

Inferring Local Structure from Pairwise Correlations

Mahajabin Rahman, Ilya Nemenman

To construct models of large, multivariate complex systems, such as those in biology, one needs to constrain which variables are allowed to interact. This can be viewed as detecting "local" structures among the variables. In the context of a simple toy model of 2D natural and synthetic images, we show that pairwise correlations between the variables -- even when severely undersampled -- provide enough information to recover local relations, including the dimensionality of the data, and to reconstruct arrangement of pixels in fully scrambled images. This proves to be successful even though higher order interaction structures are present in our data. We build intuition behind the success, which we hope might contribute to modeling complex, multivariate systems and to explaining the success of modern attention-based machine learning approaches.

2.5MLDec 29, 2016

Quantum Clustering and Gaussian Mixtures

Mahajabin Rahman, Davi Geiger

The mixture of Gaussian distributions, a soft version of k-means , is considered a state-of-the-art clustering algorithm. It is widely used in computer vision for selecting classes, e.g., color, texture, and shapes. In this algorithm, each class is described by a Gaussian distribution, defined by its mean and covariance. The data is described by a weighted sum of these Gaussian distributions. We propose a new method, inspired by quantum interference in physics. Instead of modeling each class distribution directly, we model a class wave function such that its magnitude square is the class Gaussian distribution. We then mix the class wave functions to create the mixture wave function. The final mixture distribution is then the magnitude square of the mixture wave function. As a result, we observe the quantum class interference phenomena, not present in the Gaussian mixture model. We show that the quantum method outperforms the Gaussian mixture method in every aspect of the estimations. It provides more accurate estimations of all distribution parameters, with much less fluctuations, and it is also more robust to data deformations from the Gaussian assumptions. We illustrate our method for color segmentation as an example application.

Mahajabin Rahman

2 Papers