Terrence J. Sejnowski

NC
h-index38
10papers
1,040citations
Novelty41%
AI Score37

10 Papers

CYMay 3, 2025
The Memory Paradox: Why Our Brains Need Knowledge in an Age of AI

Barbara Oakley, Michael Johnston, Ken-Zen Chen et al.

In the age of generative AI and ubiquitous digital tools, human cognition faces a structural paradox: as external aids become more capable, internal memory systems risk atrophy. Drawing on neuroscience and cognitive psychology, this paper examines how heavy reliance on AI systems and discovery-based pedagogies may impair the consolidation of declarative and procedural memory -- systems essential for expertise, critical thinking, and long-term retention. We review how tools like ChatGPT and calculators can short-circuit the retrieval, error correction, and schema-building processes necessary for robust neural encoding. Notably, we highlight striking parallels between deep learning phenomena such as "grokking" and the neuroscience of overlearning and intuition. Empirical studies are discussed showing how premature reliance on AI during learning inhibits proceduralization and intuitive mastery. We argue that effective human-AI interaction depends on strong internal models -- biological "schemata" and neural manifolds -- that enable users to evaluate, refine, and guide AI output. The paper concludes with policy implications for education and workforce training in the age of large language models.

NCDec 17, 2025
Dynamical Mechanisms for Coordinating Long-term Working Memory Based on the Precision of Spike-timing in Cortical Neurons

Terrence J. Sejnowski

In the last century, most sensorimotor studies of cortical neurons relied on average firing rates. Rate coding is efficient for fast sensorimotor processing that occurs within a few seconds. Much less is known about the neural mechanisms underlying long-term working memory with a time scale of hours (Ericsson and Kintsch, 1995). Cognitive states may not have sensory or motor correlates. For example, you can sit in a quiet room making plans without moving or sensory processing. You can also make plans while out walking. This suggests that the neural substrate for cognitive states neither depends on nor interferes with ongoing sensorimotor brain activity. In this perspective, I make the case for a possible second tier of neural activity that coexists with the well-established sensorimotor tier, based on coordinated spike-timing activity. The discovery of millisecond-precision spike initiation in cortical neurons was unexpected (Mainen and Sejnowski, 1995). Even more striking was the precision of spiking in vivo, in response to rapidly fluctuating sensory inputs, suggesting that neural circuits could preserve and manipulate sensory information through spike timing. High temporal resolution enables a broader range of neural codes. The relative timing of spikes between presynaptic and postsynaptic neurons in the millisecond range triggers spike-timing-dependent plasticity (STDP). What spike-timing mechanisms could engage STDP in vivo? Cortical traveling waves have been observed across many frequency bands with high temporal precision, and neural mechanisms can plausibly enable traveling waves to trigger STDP lasting for hours in cortical neurons. This temporary cortical network, riding astride the long-term sensorimotor network, could support cognitive processing and long-term working memory.

CLJan 25, 2024
Transformers and Cortical Waves: Encoders for Pulling In Context Across Time

Lyle Muller, Patricia S. Churchland, Terrence J. Sejnowski

The capabilities of transformer networks such as ChatGPT and other Large Language Models (LLMs) have captured the world's attention. The crucial computational mechanism underlying their performance relies on transforming a complete input sequence - for example, all the words in a sentence - into a long "encoding vector" that allows transformers to learn long-range temporal dependencies in naturalistic sequences. Specifically, "self-attention" applied to this encoding vector enhances temporal context in transformers by computing associations between pairs of words in the input sequence. We suggest that waves of neural activity traveling across single cortical areas or multiple regions at the whole-brain scale could implement a similar encoding principle. By encapsulating recent input history into a single spatial pattern at each moment in time, cortical waves may enable temporal context to be extracted from sequences of sensory inputs, the same computational principle used in transformers.

NCApr 1, 2021
Replay in Deep Learning: Current Approaches and Missing Biological Elements

Tyler L. Hayes, Giri P. Krishnan, Maxim Bazhenov et al.

Replay is the reactivation of one or more neural patterns, which are similar to the activation patterns experienced during past waking experiences. Replay was first observed in biological neural networks during sleep, and it is now thought to play a critical role in memory formation, retrieval, and consolidation. Replay-like mechanisms have been incorporated into deep artificial neural networks that learn over time to avoid catastrophic forgetting of previous knowledge. Replay algorithms have been successfully used in a wide range of deep learning methods within supervised, unsupervised, and reinforcement learning paradigms. In this paper, we provide the first comprehensive comparison between replay in the mammalian brain and replay in artificial neural networks. We identify multiple aspects of biological replay that are missing in deep learning systems and hypothesize how they could be utilized to improve artificial neural networks.

NCFeb 12, 2020
The Unreasonable Effectiveness of Deep Learning in Artificial Intelligence

Terrence J. Sejnowski

Deep learning networks have been trained to recognize speech, caption photographs and translate text between languages at high levels of performance. Although applications of deep learning networks to real world problems have become ubiquitous, our understanding of why they are so effective is lacking. These empirical results should not be possible according to sample complexity in statistics and non-convex optimization theory. However, paradoxes in the training and effectiveness of deep learning networks are being investigated and insights are being found in the geometry of high-dimensional spaces. A mathematical theory of deep learning would illuminate how they function, allow us to assess the strengths and weaknesses of different network architectures and lead to major improvements. Deep learning has provided natural ways for humans to communicate with digital devices and is foundational for building artificial general intelligence. Deep learning was inspired by the architecture of the cerebral cortex and insights into autonomy and general intelligence may be found in other brain regions that are essential for planning and survival, but major breakthroughs will be needed to achieve these goals.

LGMay 16, 2019
Utilizing Deep Learning Towards Multi-modal Bio-sensing and Vision-based Affective Computing

Siddharth Siddharth, Tzyy-Ping Jung, Terrence J. Sejnowski

In recent years, the use of bio-sensing signals such as electroencephalogram (EEG), electrocardiogram (ECG), etc. have garnered interest towards applications in affective computing. The parallel trend of deep-learning has led to a huge leap in performance towards solving various vision-based research problems such as object detection. Yet, these advances in deep-learning have not adequately translated into bio-sensing research. This work applies novel deep-learning-based methods to various bio-sensing and video data of four publicly available multi-modal emotion datasets. For each dataset, we first individually evaluate the emotion-classification performance obtained by each modality. We then evaluate the performance obtained by fusing the features from these modalities. We show that our algorithms outperform the results reported by other studies for emotion/valence/arousal/liking classification on DEAP and MAHNOB-HCI datasets and set up benchmarks for the newer AMIGOS and DREAMER datasets. We also evaluate the performance of our algorithms by combining the datasets and by using transfer learning to show that the proposed method overcomes the inconsistencies between the datasets. Hence, we do a thorough analysis of multi-modal affective data from more than 120 subjects and 2,800 trials. Finally, utilizing a convolution-deconvolution network, we propose a new technique towards identifying salient brain regions corresponding to various affective states.

HCApr 25, 2018
Multi-modal Approach for Affective Computing

Siddharth Siddharth, Tzyy-Ping Jung, Terrence J. Sejnowski

Throughout the past decade, many studies have classified human emotions using only a single sensing modality such as face video, electroencephalogram (EEG), electrocardiogram (ECG), galvanic skin response (GSR), etc. The results of these studies are constrained by the limitations of these modalities such as the absence of physiological biomarkers in the face-video analysis, poor spatial resolution in EEG, poor temporal resolution of the GSR etc. Scant research has been conducted to compare the merits of these modalities and understand how to best use them individually and jointly. Using multi-modal AMIGOS dataset, this study compares the performance of human emotion classification using multiple computational approaches applied to face videos and various bio-sensing modalities. Using a novel method for compensating physiological baseline we show an increase in the classification accuracy of various approaches that we use. Finally, we present a multi-modal emotion-classification approach in the domain of affective computing research.

HCFeb 22, 2018
An Affordable Bio-Sensing and Activity Tagging Platform for HCI Research

Siddharth, Aashish Patel, Tzyy-Ping Jung et al.

We present a novel multi-modal bio-sensing platform capable of integrating multiple data streams for use in real-time applications. The system is composed of a central compute module and a companion headset. The compute node collects, time-stamps and transmits the data while also providing an interface for a wide range of sensors including electroencephalogram, photoplethysmogram, electrocardiogram, and eye gaze among others. The companion headset contains the gaze tracking cameras. By integrating many of the measurements systems into an accessible package, we are able to explore previously unanswerable questions ranging from open-environment interactions to emotional response studies. Though some of the integrated sensors are designed from the ground-up to fit into a compact form factor, we validate the accuracy of the sensors and find that they perform similarly to, and in some cases better than, alternatives.

NCJun 14, 2017
Gradient Descent for Spiking Neural Networks

Dongsung Huh, Terrence J. Sejnowski

Much of studies on neural computation are based on network models of static neurons that produce analog output, despite the fact that information processing in the brain is predominantly carried out by dynamic neurons that produce discrete pulses called spikes. Research in spike-based computation has been impeded by the lack of efficient supervised learning algorithm for spiking networks. Here, we present a gradient descent method for optimizing spiking network models by introducing a differentiable formulation of spiking networks and deriving the exact gradient calculation. For demonstration, we trained recurrent spiking networks on two dynamic tasks: one that requires optimizing fast (~millisecond) spike-based interactions for efficient encoding of information, and a delayed memory XOR task over extended duration (~second). The results show that our method indeed optimizes the spiking network dynamics on the time scale of individual spikes as well as behavioral time scales. In conclusion, our result offers a general purpose supervised learning algorithm for spiking neural networks, thus advancing further investigations on spike-based computation.

MLOct 27, 2015
The Wilson Machine for Image Modeling

Saeed Saremi, Terrence J. Sejnowski

Learning the distribution of natural images is one of the hardest and most important problems in machine learning. The problem remains open, because the enormous complexity of the structures in natural images spans all length scales. We break down the complexity of the problem and show that the hierarchy of structures in natural images fuels a new class of learning algorithms based on the theory of critical phenomena and stochastic processes. We approach this problem from the perspective of the theory of critical phenomena, which was developed in condensed matter physics to address problems with infinite length-scale fluctuations, and build a framework to integrate the criticality of natural images into a learning algorithm. The problem is broken down by mapping images into a hierarchy of binary images, called bitplanes. In this representation, the top bitplane is critical, having fluctuations in structures over a vast range of scales. The bitplanes below go through a gradual stochastic heating process to disorder. We turn this representation into a directed probabilistic graphical model, transforming the learning problem into the unsupervised learning of the distribution of the critical bitplane and the supervised learning of the conditional distributions for the remaining bitplanes. We learnt the conditional distributions by logistic regression in a convolutional architecture. Conditioned on the critical binary image, this simple architecture can generate large, natural-looking images, with many shades of gray, without the use of hidden units, unprecedented in the studies of natural images. The framework presented here is a major step in bringing criticality and stochastic processes to machine learning and in studying natural image statistics.