NCSep 9, 2022
Deep learning in a bilateral brain with hemispheric specializationChandramouli Rajagopalan, David Rawlinson, Elkhonon Goldberg et al.
The brains of all bilaterally symmetric animals on Earth are divided into left and right hemispheres. The anatomy and functionality of the hemispheres have a large degree of overlap, but there are asymmetries, and they specialise in possesses different attributes. Other authors have used computational models to mimic hemispheric asymmetries with a focus on reproducing human data on semantic and visual processing tasks. We took a different approach and aimed to understand how dual hemispheres in a bilateral architecture interact to perform well in a given task. We propose a bilateral artificial neural network that imitates lateralisation observed in nature: that the left hemisphere specialises in local features and the right in global features. We used different training objectives to achieve the desired specialisation and tested it on an image classification task with two different CNN backbones: ResNet and VGG. Our analysis found that the hemispheres represent complementary features that are exploited by a network head that implements a type of weighted attention. The bilateral architecture outperformed a range of baselines of similar representational capacity that do not exploit differential specialisation, with the exception of a conventional ensemble of unilateral networks trained on dual training objectives for local and global features. The results demonstrate the efficacy of bilateralism, contribute to the discussion of bilateralism in biological brains, and the principle may serve as an inductive bias for new AI systems.
NEAug 26, 2022
Expanding continual few-shot learning benchmarks to include recognition of specific instancesGideon Kowadlo, Abdelrahman Ahmed, Amir Mayan et al.
Continual learning and few-shot learning are important frontiers in progress toward broader Machine Learning (ML) capabilities. Recently, there has been intense interest in combining both. One of the first examples to do so was the Continual few-shot Learning (CFSL) framework of Antoniou et al. arXiv:2004.11967. In this study, we extend CFSL in two ways that capture a broader range of challenges, important for intelligent agent behaviour in real-world conditions. First, we increased the number of classes by an order of magnitude, making the results more comparable to standard continual learning experiments. Second, we introduced an 'instance test' which requires recognition of specific instances of classes -- a capability of animal cognition that is usually neglected in ML. For an initial exploration of ML model performance under these conditions, we selected representative baseline models from the original CFSL work and added a model variant with replay. As expected, learning more classes is more difficult than the original CFSL experiments, and interestingly, the way in which image instances and classes are presented affects classification performance. Surprisingly, accuracy in the baseline instance test is comparable to other classification tasks, but poor given significant occlusion and noise. The use of replay for consolidation substantially improves performance for both types of tasks, but particularly for the instance test.
LGFeb 22
Active perception and disentangled representations allow continual, episodic zero and few-shot learningDavid Rawlinson, Gideon Kowadlo
Generalization is often regarded as an essential property of machine learning systems. However, perhaps not every component of a system needs to generalize. Training models for generalization typically produces entangled representations at the boundaries of entities or classes, which can lead to destructive interference when rapid, high-magnitude updates are required for continual or few-shot learning. Techniques for fast learning with non-interfering representations exist, but they generally fail to generalize. Here, we describe a Complementary Learning System (CLS) in which the fast learner entirely foregoes generalization in exchange for continual zero-shot and few-shot learning. Unlike most CLS approaches, which use episodic memory primarily for replay and consolidation, our fast, disentangled learner operates as a parallel reasoning system. The fast learner can overcome observation variability and uncertainty by leveraging a conventional slow, statistical learner within an active perception system: A contextual bias provided by the fast learner induces the slow learner to encode novel stimuli in familiar, generalized terms, enabling zero-shot and few-shot learning. This architecture demonstrates that fast, context-driven reasoning can coexist with slow, structured generalization, providing a pathway for robust continual learning.
AIOct 26, 2025
HRM-Agent: Training a recurrent reasoning model in dynamic environments using reinforcement learningLong H Dang, David Rawlinson
The Hierarchical Reasoning Model (HRM) has impressive reasoning abilities given its small size, but has only been applied to supervised, static, fully-observable problems. One of HRM's strengths is its ability to adapt its computational effort to the difficulty of the problem. However, in its current form it cannot integrate and reuse computation from previous time-steps if the problem is dynamic, uncertain or partially observable, or be applied where the correct action is undefined, characteristics of many real-world problems. This paper presents HRM-Agent, a variant of HRM trained using only reinforcement learning. We show that HRM can learn to navigate to goals in dynamic and uncertain maze environments. Recent work suggests that HRM's reasoning abilities stem from its recurrent inference process. We explore the dynamics of the recurrent inference process and find evidence that it is successfully reusing computation from earlier environment time-steps.
LGFeb 15, 2021
One-shot learning for the long term: consolidation with an artificial hippocampal algorithmGideon Kowadlo, Abdelrahman Ahmed, David Rawlinson
Standard few-shot experiments involve learning to efficiently match previously unseen samples by class. We claim that few-shot learning should be long term, assimilating knowledge for the future, without forgetting previous concepts. In the mammalian brain, the hippocampus is understood to play a significant role in this process, by learning rapidly and consolidating knowledge to the neocortex incrementally over a short period. In this research we tested whether an artificial hippocampal algorithm (AHA), could be used with a conventional Machine Learning (ML) model that learns incrementally analogous to the neocortex, to achieve one-shot learning both short and long term. The results demonstrated that with the addition of AHA, the system could learn in one-shot and consolidate the knowledge for the long term without catastrophic forgetting. This study is one of the first examples of using a CLS model of hippocampus to consolidate memories, and it constitutes a step toward few-shot continual learning.
LGOct 30, 2020
Unsupervised One-shot Learning of Both Specific Instances and Generalised Classes with a Hippocampal ArchitectureGideon Kowadlo, Abdelrahman Ahmed, David Rawlinson
Established experimental procedures for one-shot machine learning do not test the ability to learn or remember specific instances of classes, a key feature of animal intelligence. Distinguishing specific instances is necessary for many real-world tasks, such as remembering which cup belongs to you. Generalisation within classes conflicts with the ability to separate instances of classes, making it difficult to achieve both capabilities within a single architecture. We propose an extension to the standard Omniglot classification-generalisation framework that additionally tests the ability to distinguish specific instances after one exposure and introduces noise and occlusion corruption. Learning is defined as an ability to classify as well as recall training samples. Complementary Learning Systems (CLS) is a popular model of mammalian brain regions believed to play a crucial role in learning from a single exposure to a stimulus. We created an artificial neural network implementation of CLS and applied it to the extended Omniglot benchmark. Our unsupervised model demonstrates comparable performance to existing supervised ANNs on the Omniglot classification task (requiring generalisation), without the need for domain-specific inductive biases. On the extended Omniglot instance-recognition task, the same model also demonstrates significantly better performance than a baseline nearest-neighbour approach, given partial occlusion and noise.
LGDec 2, 2019
Long Distance Relationships without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence ModelingJeremy Gordon, David Rawlinson, Subutai Ahmad
In sequence learning tasks such as language modelling, Recurrent Neural Networks must learn relationships between input features separated by time. State of the art models such as LSTM and Transformer are trained by backpropagation of losses into prior hidden states and inputs held in memory. This allows gradients to flow from present to past and effectively learn with perfect hindsight, but at a significant memory cost. In this paper we show that it is possible to train high performance recurrent networks using information that is local in time, and thereby achieve a significantly reduced memory footprint. We describe a predictive autoencoder called bRSM featuring recurrent connections, sparse activations, and a boosting rule for improved cell utilization. The architecture demonstrates near optimal performance on a non-deterministic (stochastic) partially-observable sequence learning task consisting of high-Markov-order sequences of MNIST digits. We find that this model learns these sequences faster and more completely than an LSTM, and offer several possible explanations why the LSTM architecture might struggle with the partially observable sequence structure in this task. We also apply our model to a next word prediction task on the Penn Treebank (PTB) dataset. We show that a 'flattened' RSM network, when paired with a modern semantic word embedding and the addition of boosting, achieves 103.5 PPL (a 20-point improvement over the best N-gram models), beating ordinary RNNs trained with BPTT and approaching the scores of early LSTM implementations. This work provides encouraging evidence that strong results on challenging tasks such as language modelling may be possible using less memory intensive, biologically-plausible training regimes.
NESep 23, 2019
AHA! an 'Artificial Hippocampal Algorithm' for Episodic Machine LearningGideon Kowadlo, Abdelrahman Ahmed, David Rawlinson
The majority of ML research concerns slow, statistical learning of i.i.d. samples from large, labelled datasets. Animals do not learn this way. An enviable characteristic of animal learning is `episodic' learning - the ability to memorise a specific experience as a composition of existing concepts, after just one experience, without provided labels. The new knowledge can then be used to distinguish between similar experiences, to generalise between classes, and to selectively consolidate to long-term memory. The Hippocampus is known to be vital to these abilities. AHA is a biologically-plausible computational model of the Hippocampus. Unlike most machine learning models, AHA is trained without external labels and uses only local credit assignment. We demonstrate AHA in a superset of the Omniglot one-shot classification benchmark. The extended benchmark covers a wider range of known hippocampal functions by testing pattern separation, completion, and recall of original input. These functions are all performed within a single configuration of the computational model. Despite these constraints, image classification results are comparable to conventional deep convolutional ANNs.
MLMay 28, 2019
Learning distant cause and effect using only local and immediate credit assignmentDavid Rawlinson, Abdelrahman Ahmed, Gideon Kowadlo
We present a recurrent neural network memory that uses sparse coding to create a combinatoric encoding of sequential inputs. Using several examples, we show that the network can associate distant causes and effects in a discrete stochastic process, predict partially-observable higher-order sequences, and enable a DQN agent to navigate a maze by giving it memory. The network uses only biologically-plausible, local and immediate credit assignment. Memory requirements are typically one order of magnitude less than existing LSTM, GRU and autoregressive feed-forward sequence learning models. The most significant limitation of the memory is generalization to unseen input sequences. We explore this limitation by measuring next-word prediction perplexity on the Penn Treebank dataset.
CVApr 17, 2018
Sparse Unsupervised Capsules Generalize BetterDavid Rawlinson, Abdelrahman Ahmed, Gideon Kowadlo
We show that unsupervised training of latent capsule layers using only the reconstruction loss, without masking to select the correct output class, causes a loss of equivariances and other desirable capsule qualities. This implies that supervised capsules networks can't be very deep. Unsupervised sparsening of latent capsule layer activity both restores these qualities and appears to generalize better than supervised masking, while potentially enabling deeper capsules networks. We train a sparse, unsupervised capsules network of similar geometry to Sabour et al (2017) on MNIST, and then test classification accuracy on affNIST using an SVM layer. Accuracy is improved from benchmark 79% to 90%.