Sacha Sokoloski

LG
h-index3
5papers
10citations
Novelty65%
AI Score28

5 Papers

LGJun 10, 2022
Hierarchical mixtures of Gaussians for combined dimensionality reduction and clustering

Sacha Sokoloski, Philipp Berens

We introduce hierarchical mixtures of Gaussians (HMoGs), which unify dimensionality reduction and clustering into a single probabilistic model. HMoGs provide closed-form expressions for the model likelihood, exact inference over latent states and cluster membership, and exact algorithms for maximum-likelihood optimization. The novel exponential family parameterization of HMoGs greatly reduces their computational complexity relative to similar model-based methods, allowing them to efficiently model hundreds of latent dimensions, and thereby capture additional structure in high-dimensional data. We demonstrate HMoGs on synthetic experiments and MNIST, and show how joint optimization of dimensionality reduction and clustering facilitates increased model performance. We also explore how sparsity-constrained dimensionality reduction can further improve clustering performance while encouraging interpretability. By bridging classical statistical modelling with the scale of modern data and compute, HMoGs offer a practical approach to high-dimensional clustering that preserves statistical rigour, interpretability, and uncertainty quantification that is often missing from embedding-based, variational, and self-supervised methods.

LGApr 30, 2024
A Unified Theory of Exact Inference and Learning in Exponential Family Latent Variable Models

Sacha Sokoloski

Bayes' rule describes how to infer posterior beliefs about latent variables given observations, and inference is a critical step in learning algorithms for latent variable models (LVMs). Although there are exact algorithms for inference and learning for certain LVMs such as linear Gaussian models and mixture models, researchers must typically develop approximate inference and learning algorithms when applying novel LVMs. Here we study the line that separates LVMs that rely on approximation schemes from those that do not, and develop a general theory of exponential family LVMs for which inference and learning may be implemented exactly. Firstly, under mild assumptions about the exponential family form of the LVM, we derive a necessary and sufficient constraint on the parameters of the LVM under which the prior and posterior over the latent variables are in the same exponential family. We then show that a variety of well-known and novel models indeed have this constrained, exponential family form. Finally, we derive generalized inference and learning algorithms for these LVMs, and demonstrate them with a variety of examples. Our unified perspective facilitates both understanding and implementing exact inference and learning algorithms for a wide variety of models, and may guide researchers in the discovery of new models that avoid unnecessary approximations.

LGAug 1, 2019
Conditional Finite Mixtures of Poisson Distributions for Context-Dependent Neural Correlations

Sacha Sokoloski, Ruben Coen-Cagli

Parallel recordings of neural spike counts have revealed the existence of context-dependent noise correlations in neural populations. Theories of population coding have also shown that such correlations can impact the information encoded by neural populations about external stimuli. Although studies have shown that these correlations often have a low-dimensional structure, it has proven difficult to capture this structure in a model that is compatible with theories of rate coding in correlated populations. To address this difficulty we develop a novel model based on conditional finite mixtures of independent Poisson distributions. The model can be conditioned on context variables (e.g. stimuli or task variables), and the number of mixture components in the model can be cross-validated to estimate the dimensionality of the target correlations. We derive an expectation-maximization algorithm to efficiently fit the model to realistic amounts of data from large neural populations. We then demonstrate that the model successfully captures stimulus-dependent correlations in the responses of macaque V1 neurons to oriented gratings. Our model incorporates arbitrary nonlinear context-dependence, and can thus be applied to improve predictions of neural activity based on deep neural networks.

LGDec 22, 2015
Implementing a Bayes Filter in a Neural Circuit: The Case of Unknown Stimulus Dynamics

Sacha Sokoloski

In order to interact intelligently with objects in the world, animals must first transform neural population responses into estimates of the dynamic, unknown stimuli which caused them. The Bayesian solution to this problem is known as a Bayes filter, which applies Bayes' rule to combine population responses with the predictions of an internal model. In this paper we present a method for learning to approximate a Bayes filter when the stimulus dynamics are unknown. To do this we use the inferential properties of probabilistic population codes to compute Bayes' rule, and train a neural network to compute approximate predictions by the method of maximum likelihood. In particular, we perform stochastic gradient descent on the negative log-likelihood with a novel approximation of the gradient. We demonstrate our methods on a finite-state, a linear, and a nonlinear filtering problem, and show how the hidden layer of the neural network develops tuning curves which are consistent with findings in experimental neuroscience.

NEOct 15, 2012
A Biologically Realistic Model of Saccadic Eye Control with Probabilistic Population Codes

Sacha Sokoloski

The posterior parietal cortex is believed to direct eye movements, especially in regards to target tracking tasks, and a number of debates exist over the precise nature of the computations performed by the parietal cortex, with each side supported by different sets of biological evidence. In this paper I will present my model which navigates a course between some of these debates, towards the end of presenting a model which can explain some of the competing interpretations among the data sets. In particular, rather than assuming that proprioception or efference copies form the key source of information for computing eye position information, I use a biological plausible implementation of a Kalman filter to optimally combine the two signals, and a simple gain control mechanism in order to accommodate the latency of the proprioceptive signal. Fitting within the Bayesian brain hypothesis, the result is a Bayes optimal solution to the eye control problem, with a range of data supporting claims of biological plausibility.