Haim Sompolinsky

LG
h-index82
20papers
586citations
Novelty60%
AI Score41

20 Papers

LGOct 31, 2022
Globally Gated Deep Linear Networks

Qianyi Li, Haim Sompolinsky · harvard

Recently proposed Gated Linear Networks present a tractable nonlinear network architecture, and exhibit interesting capabilities such as learning with local error signals and reduced forgetting in sequential learning. In this work, we introduce a novel gating architecture, named Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer, thereby decoupling the architectures of the nonlinear but unlearned gatings and the learned linear processing motifs. We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit, defined by $P,N\rightarrow\infty, P/N\sim O(1)$, where P and N are the training sample size and the network width respectively. We find that the statistics of the network predictor can be expressed in terms of kernels that undergo shape renormalization through a data-dependent matrix compared to the GP kernels. Our theory accurately captures the behavior of finite width GGDLNs trained with gradient descent dynamics. We show that kernel shape renormalization gives rise to rich generalization properties w.r.t. network width, depth and L2 regularization amplitude. Interestingly, networks with sufficient gating units behave similarly to standard ReLU networks. Although gatings in the model do not participate in supervised learning, we show the utility of unsupervised learning of the gating parameters. Additionally, our theory allows the evaluation of the network's ability for learning multiple tasks by incorporating task-relevant information into the gating units. In summary, our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width. The rich and diverse behavior of the GGDLNs suggests that they are helpful analytically tractable models of learning single and multiple tasks, in finite-width nonlinear deep networks.

LGJul 14, 2024
Order parameters and phase transitions of continual learning in deep neural networks

Haozhe Shan, Qianyi Li, Haim Sompolinsky · harvard

Continual learning (CL) enables animals to learn new tasks without erasing prior knowledge. CL in artificial neural networks (NNs) is challenging due to catastrophic forgetting, where new learning degrades performance on older tasks. While various techniques exist to mitigate forgetting, theoretical insights into when and why CL fails in NNs are lacking. Here, we present a statistical-mechanics theory of CL in deep, wide NNs, which characterizes the network's input-output mapping as it learns a sequence of tasks. It gives rise to order parameters (OPs) that capture how task relations and network architecture influence forgetting and anterograde interference, as verified by numerical evaluations. For networks with a shared readout for all tasks (single-head CL), the relevant-feature and rule similarity between tasks, respectively measured by two OPs, are sufficient to predict a wide range of CL behaviors. In addition, the theory predicts that increasing the network depth can effectively reduce interference between tasks, thereby lowering forgetting. For networks with task-specific readouts (multi-head CL), the theory identifies a phase transition where CL performance shifts dramatically as tasks become less similar, as measured by another task-similarity OP. While forgetting is relatively mild compared to single-head CL across all tasks, sufficiently low similarity leads to catastrophic anterograde interference, where the network retains old tasks perfectly but completely fails to generalize new learning. Our results delineate important factors affecting CL performance and suggest strategies for mitigating forgetting.

LGSep 8, 2023
Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics

Yehonatan Avidan, Qianyi Li, Haim Sompolinsky · harvard

Artificial neural networks have revolutionized machine learning in recent years, but a complete theoretical framework for their learning process is still lacking. Substantial advances were achieved for wide networks, within two disparate theoretical frameworks: the Neural Tangent Kernel (NTK), which assumes linearized gradient descent dynamics, and the Bayesian Neural Network Gaussian Process (NNGP). We unify these two theories using gradient descent learning with an additional noise in an ensemble of wide deep networks. We construct an analytical theory for the network input-output function and introduce a new time-dependent Neural Dynamical Kernel (NDK) from which both NTK and NNGP kernels are derived. We identify two learning phases: a gradient-driven learning phase, dominated by loss minimization, in which the time scale is governed by the initialization variance. It is followed by a slow diffusive learning stage, where the parameters sample the solution space, with a time constant decided by the noise and the Bayesian prior variance. The two variance parameters strongly affect the performance in the two regimes, especially in sigmoidal neurons. In contrast to the exponential convergence of the mean predictor in the initial phase, the convergence to the equilibrium is more complex and may behave nonmonotonically. By characterizing the diffusive phase, our work sheds light on representational drift in the brain, explaining how neural activity changes continuously without degrading performance, either by ongoing gradient signals that synchronize the drifts of different synapses or by architectural biases that generate task-relevant information that is robust against the drift process. This work closes the gap between the NTK and NNGP theories, providing a comprehensive framework for the learning process of deep wide neural networks and for analyzing dynamics in biological circuits.

NCApr 14, 2022
Optimal quadratic binding for relational reasoning in vector symbolic neural architectures

Naoki Hiratani, Haim Sompolinsky

Binding operation is fundamental to many cognitive processes, such as cognitive map formation, relational reasoning, and language comprehension. In these processes, two different modalities, such as location and objects, events and their contextual cues, and words and their roles, need to be bound together, but little is known about the underlying neural mechanisms. Previous works introduced a binding model based on quadratic functions of bound pairs, followed by vector summation of multiple pairs. Based on this framework, we address following questions: Which classes of quadratic matrices are optimal for decoding relational structures? And what is the resultant accuracy? We introduce a new class of binding matrices based on a matrix representation of octonion algebra, an eight-dimensional extension of complex numbers. We show that these matrices enable a more accurate unbinding than previously known methods when a small number of pairs are present. Moreover, numerical optimization of a binding operator converges to this octonion binding. We also show that when there are a large number of bound pairs, however, a random quadratic binding performs as well as the octonion and previously-proposed binding methods. This study thus provides new insight into potential neural mechanisms of binding operations in the brain.

NCJun 14, 2022
A theory of learning with constrained weight-distribution

Weishun Zhong, Ben Sorscher, Daniel D Lee et al.

A central question in computational neuroscience is how structure determines function in neural networks. The emerging high-quality large-scale connectomic datasets raise the question of what general functional principles can be gleaned from structural information such as the distribution of excitatory/inhibitory synapse types and the distribution of synaptic weights. Motivated by this question, we developed a statistical mechanical theory of learning in neural networks that incorporates structural information as constraints. We derived an analytical solution for the memory capacity of the perceptron, a basic feedforward model of supervised learning, with constraint on the distribution of its weights. Our theory predicts that the reduction in capacity due to the constrained weight-distribution is related to the Wasserstein distance between the imposed distribution and that of the standard normal distribution. To test the theoretical predictions, we use optimal transport theory and information geometry to develop an SGD-based algorithm to find weights that simultaneously learn the input-output task and satisfy the distribution constraint. We show that training in our algorithm can be interpreted as geodesic flows in the Wasserstein space of probability distributions. We further developed a statistical mechanical theory for teacher-student perceptron rule learning and ask for the best way for the student to incorporate prior knowledge of the rule. Our theory shows that it is beneficial for the learner to adopt different prior weight distributions during learning, and shows that distribution-constrained learning outperforms unconstrained and sign-constrained learning. Our theory and algorithm provide novel strategies for incorporating prior knowledge about weights into learning, and reveal a powerful connection between structure and function in neural networks.

LGJul 26, 2024
When narrower is better: the narrow width limit of Bayesian parallel branching neural networks

Zechen Zhang, Haim Sompolinsky

The infinite width limit of random neural networks is known to result in Neural Networks as Gaussian Process (NNGP) (Lee et al. (2018)), characterized by task-independent kernels. It is widely accepted that larger network widths contribute to improved generalization (Park et al. (2019)). However, this work challenges this notion by investigating the narrow width limit of the Bayesian Parallel Branching Neural Network (BPB-NN), an architecture that resembles neural networks with residual blocks. We demonstrate that when the width of a BPB-NN is significantly smaller compared to the number of training examples, each branch exhibits more robust learning due to a symmetry breaking of branches in kernel renormalization. Surprisingly, the performance of a BPB-NN in the narrow width limit is generally superior to or comparable to that achieved in the wide width limit in bias-limited scenarios. Furthermore, the readout norms of each branch in the narrow width limit are mostly independent of the architectural hyperparameters but generally reflective of the nature of the data. We demonstrate such phenomenon primarily in the branching graph neural networks, where each branch represents a different order of convolutions of the graph; we also extend the results to other more general architectures such as the residual-MLP and demonstrate that the narrow width effect is a general feature of the branching networks. Our results characterize a newly defined narrow-width regime for parallel branching networks in general.

LGMay 24, 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers

Lorenzo Tiberi, Francesca Mignacco, Kazuki Irie et al. · harvard

Despite the remarkable empirical performance of Transformers, their theoretical understanding remains elusive. Here, we consider a deep multi-head self-attention network, that is closely related to Transformers yet analytically tractable. We develop a statistical mechanics theory of Bayesian learning in this model, deriving exact equations for the network's predictor statistics under the finite-width thermodynamic limit, i.e., $N,P\rightarrow\infty$, $P/N=\mathcal{O}(1)$, where $N$ is the network width and $P$ is the number of training examples. Our theory shows that the predictor statistics are expressed as a sum of independent kernels, each one pairing different 'attention paths', defined as information pathways through different attention heads across layers. The kernels are weighted according to a 'task-relevant kernel combination' mechanism that aligns the total kernel with the task labels. As a consequence, this interplay between attention paths enhances generalization performance. Experiments confirm our findings on both synthetic and real-world sequence classification tasks. Finally, our theory explicitly relates the kernel combination mechanism to properties of the learned weights, allowing for a qualitative transfer of its insights to models trained via gradient descent. As an illustration, we demonstrate an efficient size reduction of the network, by pruning those attention heads that are deemed less relevant by our theory.

NCDec 21, 2023
Probing Biological and Artificial Neural Networks with Task-dependent Neural Manifolds

Michael Kuoch, Chi-Ning Chou, Nikhil Parthasarathy et al.

Recently, growth in our understanding of the computations performed in both biological and artificial neural networks has largely been driven by either low-level mechanistic studies or global normative approaches. However, concrete methodologies for bridging the gap between these levels of abstraction remain elusive. In this work, we investigate the internal mechanisms of neural networks through the lens of neural population geometry, aiming to provide understanding at an intermediate level of abstraction, as a way to bridge that gap. Utilizing manifold capacity theory (MCT) from statistical physics and manifold alignment analysis (MAA) from high-dimensional statistics, we probe the underlying organization of task-dependent manifolds in deep neural networks and macaque neural recordings. Specifically, we quantitatively characterize how different learning objectives lead to differences in the organizational strategies of these models and demonstrate how these geometric analyses are connected to the decodability of task-relevant information. These analyses present a strong direction for bridging mechanistic and normative theories in neural networks through neural population geometry, potentially opening up many future research avenues in both machine learning and neuroscience.

CLApr 30, 2025
Memorization and Knowledge Injection in Gated LLMs

Xu Pan, Ely Hahami, Zechen Zhang et al.

Large Language Models (LLMs) currently struggle to sequentially add new memories and integrate new knowledge. These limitations contrast with the human ability to continuously learn from new experiences and acquire knowledge throughout life. Most existing approaches add memories either through large context windows or external memory buffers (e.g., Retrieval-Augmented Generation), and studies on knowledge injection rarely test scenarios resembling everyday life events. In this work, we introduce a continual learning framework, Memory Embedded in Gated LLMs (MEGa), which injects event memories directly into the weights of LLMs. Each memory is stored in a dedicated set of gated low-rank weights. During inference, a gating mechanism activates relevant memory weights by matching query embeddings to stored memory embeddings. This enables the model to both recall entire memories and answer related questions. On two datasets - fictional characters and Wikipedia events - MEGa outperforms baseline approaches in mitigating catastrophic forgetting. Our model draws inspiration from the complementary memory system of the human brain.

LGNov 12, 2024
Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules

Binxu Wang, Jiaqi Shang, Haim Sompolinsky · harvard

Humans excel at discovering regular structures from limited samples and applying inferred rules to novel settings. We investigate whether modern generative models can similarly learn underlying rules from finite samples and perform reasoning through conditional sampling. Inspired by Raven's Progressive Matrices task, we designed GenRAVEN dataset, where each sample consists of three rows, and one of 40 relational rules governing the object position, number, or attributes applies to all rows. We trained generative models to learn the data distribution, where samples are encoded as integer arrays to focus on rule learning. We compared two generative model families: diffusion (EDM, DiT, SiT) and autoregressive models (GPT2, Mamba). We evaluated their ability to generate structurally consistent samples and perform panel completion via unconditional and conditional sampling. We found diffusion models excel at unconditional generation, producing more novel and consistent samples from scratch and memorizing less, but performing less well in panel completion, even with advanced conditional sampling methods. Conversely, autoregressive models excel at completing missing panels in a rule-consistent manner but generate less consistent samples unconditionally. We observe diverse data scaling behaviors: for both model families, rule learning emerges at a certain dataset size - around 1000s examples per rule. With more training data, diffusion models improve both their unconditional and conditional generation capabilities. However, for autoregressive models, while panel completion improves with more training data, unconditional generation consistency declines. Our findings highlight complementary capabilities and limitations of diffusion and autoregressive models in rule learning and reasoning tasks, suggesting avenues for further research into their mechanisms and potential for human-like reasoning.

CLOct 10, 2025
Closing the Data-Efficiency Gap Between Autoregressive and Masked Diffusion LLMs

Xu Pan, Ely Hahami, Jingxuan Fan et al.

Despite autoregressive large language models (arLLMs) being the current dominant paradigm in language modeling, they resist knowledge injection via fine-tuning due to inherent shortcomings such as the "reversal curse" -- the challenge of answering questions that reverse the original information order in the training sample. Masked diffusion large language models (dLLMs) are rapidly emerging as a powerful alternative to the arLLM paradigm, with evidence of better data efficiency and free of the "reversal curse" in pre-training. However, it is unknown whether these advantages extend to the post-training phase, i.e. whether pre-trained dLLMs can easily acquire new knowledge through fine-tuning. On three diverse datasets, we fine-tune arLLMs and dLLMs, evaluating them with forward and backward style Question Answering (QA) to probe knowledge generalization and the reversal curse. Our results confirm that arLLMs critically rely on extensive data augmentation via paraphrases for QA generalization, and paraphrases are only effective when their information order matches the QA style. Conversely, dLLMs achieve high accuracies on both forward and backward QAs without paraphrases; adding paraphrases yields only marginal gains. Lastly, inspired by the dLLM's performance, we introduce a novel masked fine-tuning paradigm for knowledge injection into pre-trained arLLMs. This proposed method successfully and drastically improves the data efficiency of arLLM fine-tuning, effectively closing the performance gap with dLLMs.

NCFeb 24, 2025
Unraveling the geometry of visual relational reasoning

Jiaqi Shang, Gabriel Kreiman, Haim Sompolinsky

Humans readily generalize abstract relations, such as recognizing "constant" in shape or color, whereas neural networks struggle, limiting their flexible reasoning. To investigate mechanisms underlying such generalization, we introduce SimplifiedRPM, a novel benchmark for systematically evaluating abstract relational reasoning, addressing limitations in prior datasets. In parallel, we conduct human experiments to quantify relational difficulty, enabling direct model-human comparisons. Testing four models, ResNet-50, Vision Transformer, Wild Relation Network, and Scattering Compositional Learner (SCL), we find that SCL generalizes best and most closely aligns with human behavior. Using a geometric approach, we identify key representation properties that accurately predict generalization and uncover a fundamental trade-off between signal and dimensionality: novel relations compress into training-induced subspaces. Layer-wise analysis reveals where relational structure emerges, highlights bottlenecks, and generates concrete hypotheses about abstract reasoning in the brain. Motivated by these insights, we propose SNRloss, a novel objective explicitly balancing representation geometry. Our results establish a geometric foundation for relational reasoning, paving the way for more human-like visual reasoning in AI and opening promising avenues for extending geometric analysis to broader cognitive tasks.

LGJun 24, 2024
Coding schemes in neural networks learning classification tasks

Alexander van Meegen, Haim Sompolinsky

Neural networks posses the crucial ability to generate meaningful representations of task-dependent features. Indeed, with appropriate scaling, supervised learning in neural networks can result in strong, task-dependent feature learning. However, the nature of the emergent representations, which we call the `coding scheme', is still unclear. To understand the emergent coding scheme, we investigate fully-connected, wide neural networks learning classification tasks using the Bayesian framework where learning shapes the posterior distribution of the network weights. Consistent with previous findings, our analysis of the feature learning regime (also known as `non-lazy', `rich', or `mean-field' regime) shows that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity. In linear networks, an analog coding scheme of the task emerges. Despite the strong representations, the mean predictor is identical to the lazy case. In nonlinear networks, spontaneous symmetry breaking leads to either redundant or sparse coding schemes. Our findings highlight how network properties such as scaling of weights and neuronal nonlinearity can profoundly influence the emergent representations.

LGDec 7, 2020
Statistical Mechanics of Deep Linear Neural Networks: The Back-Propagating Kernel Renormalization

Qianyi Li, Haim Sompolinsky

The success of deep learning in many real-world tasks has triggered an intense effort to understand the power and limitations of deep learning in the training and generalization of complex tasks, so far with limited progress. In this work, we study the statistical mechanics of learning in Deep Linear Neural Networks (DLNNs) in which the input-output function of an individual unit is linear. Despite the linearity of the units, learning in DLNNs is nonlinear, hence studying its properties reveals some of the features of nonlinear Deep Neural Networks (DNNs). Importantly, we solve exactly the network properties following supervised learning using an equilibrium Gibbs distribution in the weight space. To do this, we introduce the Back-Propagating Kernel Renormalization (BPKR), which allows for the incremental integration of the network weights starting from the network output layer and progressing backward until the first layer's weights are integrated out. This procedure allows us to evaluate important network properties, such as its generalization error, the role of network width and depth, the impact of the size of the training set, and the effects of weight regularization and learning stochasticity. BPKR does not assume specific statistics of the input or the task's output. Furthermore, by performing partial integration of the layers, the BPKR allows us to compute the properties of the neural representations across the different hidden layers. We have proposed an extension of the BPKR to nonlinear DNNs with ReLU. Surprisingly, our numerical simulations reveal that despite the nonlinearity, the predictions of our theory are largely shared by ReLU networks in a wide regime of parameters. Our work is the first exact statistical mechanical study of learning in a family of DNNs, and the first successful theory of learning through successive integration of DoFs in the learned weight space.

DIS-NNAug 19, 2020
A new role for circuit expansion for learning in neural networks

Julia Steinberg, Madhu Advani, Haim Sompolinsky

Many sensory pathways in the brain rely on sparsely active populations of neurons downstream from the input stimuli. The biological reason for the occurrence of expanded structure in the brain is unclear, but may be because expansion can increase the expressive power of a neural network. In this work, we show that expanding a neural network can improve its generalization performance even in cases in which the expanded structure is pruned after the learning period. To study this setting we use a teacher-student framework where a perceptron teacher network generates labels which are corrupted with small amounts of noise. We then train a student network that is structurally matched to the teacher and can achieve optimal accuracy if given the teacher's synaptic weights. We find that sparse expansion of the input of a student perceptron network both increases its capacity and improves the generalization performance of the network when learning a noisy rule from a teacher perceptron when these expansions are pruned after learning. We find similar behavior when the expanded units are stochastic and uncorrelated with the input and analyze this network in the mean field limit. We show by solving the mean field equations that the generalization error of the stochastic expanded student network continues to drop as the size of the network increases. The improvement in generalization performance occurs despite the increased complexity of the student network relative to the teacher it is trying to learn. We show that this effect is closely related to the addition of slack variables in artificial neural networks and suggest possible implications for artificial and biological neural networks.

MLApr 2, 2020
Predicting the outputs of finite deep neural networks trained with noisy gradients

Gadi Naveh, Oded Ben-David, Haim Sompolinsky et al.

A recent line of works studied wide deep neural networks (DNNs) by approximating them as Gaussian Processes (GPs). A DNN trained with gradient flow was shown to map to a GP governed by the Neural Tangent Kernel (NTK), whereas earlier works showed that a DNN with an i.i.d. prior over its weights maps to the so-called Neural Network Gaussian Process (NNGP). Here we consider a DNN training protocol, involving noise, weight decay and finite width, whose outcome corresponds to a certain non-Gaussian stochastic process. An analytical framework is then introduced to analyze this non-Gaussian process, whose deviation from a GP is controlled by the finite width. Our contribution is three-fold: (i) In the infinite width limit, we establish a correspondence between DNNs trained with noisy gradients and the NNGP, not the NTK. (ii) We provide a general analytical form for the finite width correction (FWC) for DNNs with arbitrary activation functions and depth and use it to predict the outputs of empirical finite networks with high accuracy. Analyzing the FWC behavior as a function of $n$, the training set size, we find that it is negligible for both the very small $n$ regime, and, surprisingly, for the large $n$ regime (where the GP error scales as $O(1/n)$). (iii) We flesh out algebraically how these FWCs can improve the performance of finite convolutional neural networks (CNNs) relative to their GP counterparts on image classification tasks.

DIS-NNOct 17, 2017
Classification and Geometry of General Perceptual Manifolds

SueYeon Chung, Daniel D. Lee, Haim Sompolinsky

Perceptual manifolds arise when a neural population responds to an ensemble of sensory signals associated with different physical features (e.g., orientation, pose, scale, location, and intensity) of the same perceptual object. Object recognition and discrimination requires classifying the manifolds in a manner that is insensitive to variability within a manifold. How neuronal systems give rise to invariant object classification and recognition is a fundamental problem in brain theory as well as in machine learning. Here we study the ability of a readout network to classify objects from their perceptual manifold representations. We develop a statistical mechanical theory for the linear classification of manifolds with arbitrary geometry revealing a remarkable relation to the mathematics of conic decomposition. Novel geometrical measures of manifold radius and manifold dimension are introduced which can explain the classification capacity for manifolds of various geometries. The general theory is demonstrated on a number of representative manifolds, including L2 ellipsoids prototypical of strictly convex manifolds, L1 balls representing polytopes consisting of finite sample points, and orientation manifolds which arise from neurons tuned to respond to a continuous angle variable, such as object orientation. The effects of label sparsity on the classification capacity of manifolds are elucidated, revealing a scaling relation between label sparsity and manifold radius. Theoretical predictions are corroborated by numerical simulations using recently developed algorithms to compute maximum margin solutions for manifold dichotomies. Our theory and its extensions provide a powerful and rich framework for applying statistical mechanics of linear classification to data arising from neuronal responses to object stimuli, as well as to artificial deep networks trained for object recognition tasks.

LGMay 28, 2017
Learning Data Manifolds with a Cutting Plane Method

SueYeon Chung, Uri Cohen, Haim Sompolinsky et al.

We consider the problem of classifying data manifolds where each manifold represents invariances that are parameterized by continuous degrees of freedom. Conventional data augmentation methods rely upon sampling large numbers of training examples from these manifolds; instead, we propose an iterative algorithm called M_{CP} based upon a cutting-plane approach that efficiently solves a quadratic semi-infinite programming problem to find the maximum margin solution. We provide a proof of convergence as well as a polynomial bound on the number of iterations required for a desired tolerance in the objective function. The efficiency and performance of M_{CP} are demonstrated in high-dimensional simulations and on image manifolds generated from the ImageNet dataset. Our results indicate that M_{CP} is able to rapidly learn good classifiers and shows superior generalization performance compared with conventional maximum margin methods using data augmentation methods.

NCMay 3, 2017
Balanced Excitation and Inhibition are Required for High-Capacity, Noise-Robust Neuronal Selectivity

Ran Rubin, L. F. Abbott, Haim Sompolinsky

Neurons and networks in the cerebral cortex must operate reliably despite multiple sources of noise. To evaluate the impact of both input and output noise, we determine the robustness of single-neuron stimulus selective responses, as well as the robustness of attractor states of networks of neurons performing memory tasks. We find that robustness to output noise requires synaptic connections to be in a balanced regime in which excitation and inhibition are strong and largely cancel each other. We evaluate the conditions required for this regime to exist and determine the properties of networks operating within it. A plausible synaptic plasticity rule for learning that balances weight configurations is presented. Our theory predicts an optimal ratio of the number of excitatory and inhibitory synapses for maximizing the encoding capacity of balanced networks for a given statistics of afferent activations. Previous work has shown that balanced networks amplify spatio-temporal variability and account for observed asynchronous irregular states. Here we present a novel type of balanced network that amplifies small changes in the impinging signals, and emerges automatically from learning to perform neuronal and network functions robustly.

DIS-NNDec 6, 2015
Linear Readout of Object Manifolds

SueYeon Chung, Daniel D. Lee, Haim Sompolinsky

Objects are represented in sensory systems by continuous manifolds due to sensitivity of neuronal responses to changes in physical features such as location, orientation, and intensity. What makes certain sensory representations better suited for invariant decoding of objects by downstream networks? We present a theory that characterizes the ability of a linear readout network, the perceptron, to classify objects from variable neural responses. We show how the readout perceptron capacity depends on the dimensionality, size, and shape of the object manifolds in its input neural representation.