ROJul 9, 2024
Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation ChallengeSriram Yenamandra, Arun Ramachandran, Mukul Khanna et al. · cmu
In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface within that environment. We organized a NeurIPS 2023 competition featuring both simulation and real-world components to evaluate solutions to this task. Our baselines on the most challenging version of this task, using real perception in simulation, achieved only an 0.8% success rate; by the end of the competition, the best participants achieved an 10.8\% success rate, a 13x improvement. We observed that the most successful teams employed a variety of methods, yet two common threads emerged among the best solutions: enhancing error detection and recovery, and improving the integration of perception with decision-making processes. In this paper, we detail the results and methodologies used, both in simulation and real-world settings. We discuss the lessons learned and their implications for future research. Additionally, we compare performance in real and simulated environments, emphasizing the necessity for robust generalization to novel settings.
NCSep 3, 2024
Decoding finger velocity from cortical spike trains with recurrent spiking neural networksTengjun Liu, Julia Gygax, Julian Rossbroich et al.
Invasive cortical brain-machine interfaces (BMIs) can significantly improve the life quality of motor-impaired patients. Nonetheless, externally mounted pedestals pose an infection risk, which calls for fully implanted systems. Such systems, however, must meet strict latency and energy constraints while providing reliable decoding performance. While recurrent spiking neural networks (RSNNs) are ideally suited for ultra-low-power, low-latency processing on neuromorphic hardware, it is unclear whether they meet the above requirements. To address this question, we trained RSNNs to decode finger velocity from cortical spike trains (CSTs) of two macaque monkeys. First, we found that a large RSNN model outperformed existing feedforward spiking neural networks (SNNs) and artificial neural networks (ANNs) in terms of their decoding accuracy. We next developed a tiny RSNN with a smaller memory footprint, low firing rates, and sparse connectivity. Despite its reduced computational requirements, the resulting model performed substantially better than existing SNN and ANN decoders. Our results thus demonstrate that RSNNs offer competitive CST decoding performance under tight resource constraints and are promising candidates for fully implanted ultra-low-power BMIs with the potential to revolutionize patient care.
NCMar 14, 2023
Emergent Bio-Functional Similarities in a Cortical-Spike-Train-Decoding Spiking Neural Network Facilitate Predictions of Neural ComputationTengjun Liu, Yansong Chua, Yiwei Zhang et al.
Despite its better bio-plausibility, goal-driven spiking neural network (SNN) has not achieved applicable performance for classifying biological spike trains, and showed little bio-functional similarities compared to traditional artificial neural networks. In this study, we proposed the motorSRNN, a recurrent SNN topologically inspired by the neural motor circuit of primates. By employing the motorSRNN in decoding spike trains from the primary motor cortex of monkeys, we achieved a good balance between classification accuracy and energy consumption. The motorSRNN communicated with the input by capturing and cultivating more cosine-tuning, an essential property of neurons in the motor cortex, and maintained its stability during training. Such training-induced cultivation and persistency of cosine-tuning was also observed in our monkeys. Moreover, the motorSRNN produced additional bio-functional similarities at the single-neuron, population, and circuit levels, demonstrating biological authenticity. Thereby, ablation studies on motorSRNN have suggested long-term stable feedback synapses contribute to the training-induced cultivation in the motor cortex. Besides these novel findings and predictions, we offer a new framework for building authentic models of neural computation.
CVAug 14, 2022
Gradient Mask: Lateral Inhibition Mechanism Improves Performance in Artificial Neural NetworksLei Jiang, Yongqing Liu, Shihai Xiao et al.
Lateral inhibitory connections have been observed in the cortex of the biological brain, and has been extensively studied in terms of its role in cognitive functions. However, in the vanilla version of backpropagation in deep learning, all gradients (which can be understood to comprise of both signal and noise gradients) flow through the network during weight updates. This may lead to overfitting. In this work, inspired by biological lateral inhibition, we propose Gradient Mask, which effectively filters out noise gradients in the process of backpropagation. This allows the learned feature information to be more intensively stored in the network while filtering out noisy or unimportant features. Furthermore, we demonstrate analytically how lateral inhibition in artificial neural networks improves the quality of propagated gradients. A new criterion for gradient quality is proposed which can be used as a measure during training of various convolutional neural networks (CNNs). Finally, we conduct several different experiments to study how Gradient Mask improves the performance of the network both quantitatively and qualitatively. Quantitatively, accuracy in the original CNN architecture, accuracy after pruning, and accuracy after adversarial attacks have shown improvements. Qualitatively, the CNN trained using Gradient Mask has developed saliency maps that focus primarily on the object of interest, which is useful for data augmentation and network interpretability.
SDSep 7, 2023
Spiking Structured State Space Model for Monaural Speech EnhancementYu Du, Xu Liu, Yansong Chua
Speech enhancement seeks to extract clean speech from noisy signals. Traditional deep learning methods face two challenges: efficiently using information in long speech sequences and high computational costs. To address these, we introduce the Spiking Structured State Space Model (Spiking-S4). This approach merges the energy efficiency of Spiking Neural Networks (SNN) with the long-range sequence modeling capabilities of Structured State Space Models (S4), offering a compelling solution. Evaluation on the DNS Challenge and VoiceBank+Demand Datasets confirms that Spiking-S4 rivals existing Artificial Neural Network (ANN) methods but with fewer computational resources, as evidenced by reduced parameters and Floating Point Operations (FLOPs).
CVMay 19, 2025Code
MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention FusionWei Hua, Chenlin Zhou, Jibin Wu et al.
The combination of Spiking Neural Networks (SNNs) with Vision Transformer architectures has garnered significant attention due to their potential for energy-efficient and high-performance computing paradigms. However, a substantial performance gap still exists between SNN-based and ANN-based transformer architectures. While existing methods propose spiking self-attention mechanisms that are successfully combined with SNNs, the overall architectures proposed by these methods suffer from a bottleneck in effectively extracting features from different image scales. In this paper, we address this issue and propose MSVIT. This novel spike-driven Transformer architecture firstly uses multi-scale spiking attention (MSSA) to enhance the capabilities of spiking attention blocks. We validate our approach across various main datasets. The experimental results show that MSVIT outperforms existing SNN-based models, positioning itself as a state-of-the-art solution among SNN-transformer architectures. The codes are available at https://github.com/Nanhu-AI-Lab/MSViT.
LGJul 12, 2021
SoftHebb: Bayesian Inference in Unsupervised Hebbian Soft Winner-Take-All NetworksTimoleon Moraitis, Dmitry Toichkin, Adrien Journé et al.
Hebbian plasticity in winner-take-all (WTA) networks is highly attractive for neuromorphic on-chip learning, owing to its efficient, local, unsupervised, and on-line nature. Moreover, its biological plausibility may help overcome important limitations of artificial algorithms, such as their susceptibility to adversarial attacks, and their high demands for training-example quantity and repetition. However, Hebbian WTA learning has found little use in machine learning (ML), likely because it has been missing an optimization theory compatible with deep learning (DL). Here we show rigorously that WTA networks constructed by standard DL elements, combined with a Hebbian-like plasticity that we derive, maintain a Bayesian generative model of the data. Importantly, without any supervision, our algorithm, SoftHebb, minimizes cross-entropy, i.e. a common loss function in supervised DL. We show this theoretically and in practice. The key is a "soft" WTA where there is no absolute "hard" winner neuron. Strikingly, in shallow-network comparisons with backpropagation (BP), SoftHebb shows advantages beyond its Hebbian efficiency. Namely, it converges in fewer iterations, and is significantly more robust to noise and adversarial attacks. Notably, attacks that maximally confuse SoftHebb are also confusing to the human eye, potentially linking human perceptual robustness, with Hebbian WTA circuits of cortex. Finally, SoftHebb can generate synthetic objects as interpolations of real object classes. All in all, Hebbian efficiency, theoretical underpinning, cross-entropy-minimization, and surprising empirical advantages, suggest that SoftHebb may inspire highly neuromorphic and radically different, but practical and advantageous learning algorithms and hardware accelerators.
NEMar 26, 2020
Rectified Linear Postsynaptic Potential Function for Backpropagation in Deep Spiking Neural NetworksMalu Zhang, Jiadong Wang, Burin Amornpaisannon et al.
Spiking Neural Networks (SNNs) use spatio-temporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation. Motivated by the success of deep learning, the study of Deep Spiking Neural Networks (DeepSNNs) provides promising directions for artificial intelligence applications. However, training of DeepSNNs is not straightforward because the well-studied error back-propagation (BP) algorithm is not directly applicable. In this paper, we first establish an understanding as to why error back-propagation does not work well in DeepSNNs. To address this problem, we propose a simple yet efficient Rectified Linear Postsynaptic Potential function (ReL-PSP) for spiking neurons and propose a Spike-Timing-Dependent Back-Propagation (STDBP) learning algorithm for DeepSNNs. In STDBP algorithm, the timing of individual spikes is used to convey information (temporal coding), and learning (back-propagation) is performed based on spike timing in an event-driven manner. Our experimental results show that the proposed learning algorithm achieves state-of-the-art classification accuracy in single spike time based learning algorithms of DeepSNNs. Furthermore, by utilizing the trained model parameters obtained from the proposed STDBP learning algorithm, we demonstrate the ultra-low-power inference operations on a recently proposed neuromorphic inference accelerator. Experimental results show that the neuromorphic hardware consumes 0.751~mW of the total power consumption and achieves a low latency of 47.71~ms to classify an image from the MNIST dataset. Overall, this work investigates the contribution of spike timing dynamics to information encoding, synaptic plasticity and decision making, providing a new perspective to design of future DeepSNNs and neuromorphic hardware systems.
IVDec 28, 2019
Transfer Learning in General Lensless Imaging through Scattering MediaYukuan Yang, Lei Deng, Peng Jiao et al.
Recently deep neural networks (DNNs) have been successfully introduced to the field of lensless imaging through scattering media. By solving an inverse problem in computational imaging, DNNs can overcome several shortcomings in the conventional lensless imaging through scattering media methods, namely, high cost, poor quality, complex control, and poor anti-interference. However, for training, a large number of training samples on various datasets have to be collected, with a DNN trained on one dataset generally performing poorly for recovering images from another dataset. The underlying reason is that lensless imaging through scattering media is a high dimensional regression problem and it is difficult to obtain an analytical solution. In this work, transfer learning is proposed to address this issue. Our main idea is to train a DNN on a relatively complex dataset using a large number of training samples and fine-tune the last few layers using very few samples from other datasets. Instead of the thousands of samples required to train from scratch, transfer learning alleviates the problem of costly data acquisition. Specifically, considering the difference in sample sizes and similarity among datasets, we propose two DNN architectures, namely LISMU-FCN and LISMU-OCN, and a balance loss function designed for balancing smoothness and sharpness. LISMU-FCN, with much fewer parameters, can achieve imaging across similar datasets while LISMU-OCN can achieve imaging across significantly different datasets. What's more, we establish a set of simulation algorithms which are close to the real experiment, and it is of great significance and practical value in the research on lensless scattering imaging. In summary, this work provides a new solution for lensless imaging through scattering media using transfer learning in DNNs.
NESep 12, 2019
Neural Population Coding for Effective Temporal ClassificationZihan Pan, Jibin Wu, Yansong Chua et al.
Neural encoding plays an important role in faithfully describing the temporally rich patterns, whose instances include human speech and environmental sounds. For tasks that involve classifying such spatio-temporal patterns with the Spiking Neural Networks (SNNs), how these patterns are encoded directly influence the difficulty of the task. In this paper, we compare several existing temporal and population coding schemes and evaluate them on both speech (TIDIGITS) and sound (RWCP) datasets. We show that, with population neural codings, the encoded patterns are linearly separable using the Support Vector Machine (SVM). We note that the population neural codings effectively project the temporal information onto the spatial domain, thus improving linear separability in the spatial dimension, achieving an accuracy of 95\% and 100\% for TIDIGITS and RWCP datasets classified using the SVM, respectively. This observation suggests that an effective neural coding scheme greatly simplifies the classification problem such that a simple linear classifier would suffice. The above datasets are then classified using the Tempotron, an SNN-based classifier. SNN classification results agree with the SVM findings that population neural codings help to improve classification accuracy. Hence, other than the learning algorithm, effective neural encoding is just as important as an SNN designed to recognize spatio-temporal patterns. It is an often neglected but powerful abstraction that deserves further study.
SDSep 3, 2019
An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networksZihan Pan, Yansong Chua, Jibin Wu et al.
Auditory front-end is an integral part of a spiking neural network (SNN) when performing auditory cognitive tasks. It encodes the temporal dynamic stimulus, such as speech and audio, into an efficient, effective and reconstructable spike pattern to facilitate the subsequent processing. However, most of the auditory front-ends in current studies have not made use of recent findings in psychoacoustics and physiology concerning human listening. In this paper, we propose a neural encoding and decoding scheme that is optimized for speech processing. The neural encoding scheme, that we call Biologically plausible Auditory Encoding (BAE), emulates the functions of the perceptual components of the human auditory system, that include the cochlear filter bank, the inner hair cells, auditory masking effects from psychoacoustic models, and the spike neural encoding by the auditory nerve. We evaluate the perceptual quality of the BAE scheme using PESQ; the performance of the BAE based on speech recognition experiments. Finally, we also built and published two spike-version of speech datasets: the Spike-TIDIGITS and the Spike-TIMIT, for researchers to use and benchmarking of future SNN research.
NEJul 2, 2019
A Tandem Learning Rule for Effective Training and Rapid Inference of Deep Spiking Neural NetworksJibin Wu, Yansong Chua, Malu Zhang et al.
Spiking neural networks (SNNs) represent the most prominent biologically inspired computing model for neuromorphic computing (NC) architectures. However, due to the non-differentiable nature of spiking neuronal functions, the standard error back-propagation algorithm is not directly applicable to SNNs. In this work, we propose a tandem learning framework, that consists of an SNN and an Artificial Neural Network (ANN) coupled through weight sharing. The ANN is an auxiliary structure that facilitates the error back-propagation for the training of the SNN at the spike-train level. To this end, we consider the spike count as the discrete neural representation in the SNN, and design ANN neuronal activation function that can effectively approximate the spike count of the coupled SNN. The proposed tandem learning rule demonstrates competitive pattern recognition and regression capabilities on both the conventional frame-based and event-based vision datasets, with at least an order of magnitude reduced inference time and total synaptic operations over other state-of-the-art SNN implementations. Therefore, the proposed tandem learning rule offers a novel solution to training efficient, low latency, and high accuracy deep SNNs with low computing resources.
NEApr 3, 2019
Hardware-friendly Neural Network Architecture for Neuromorphic ComputingRoshan Gopalakrishnan, Yansong Chua, Ashish Jith Sreejith Kumar
The hardware-software co-optimization of neural network architectures is becoming a major stream of research especially due to the emergence of commercial neuromorphic chips such as the IBM Truenorth and Intel Loihi. Development of specific neural network architectures in tandem with the design of the neuromorphic hardware considering the hardware constraints will make a huge impact in the complete system level application. In this paper, we study various neural network architectures and propose one that is hardware-friendly for a neuromorphic hardware with crossbar array of synapses. Considering the hardware constraints, we demonstrate how one may design the neuromorphic hardware so as to maximize classification accuracy in the trained network architecture, while concurrently, we choose a neural network architecture so as to maximize utilization in the neuromorphic cores. We also proposed a framework for mapping a neural network onto a neuromorphic chip named as the Mapping and Debugging (MaD) framework. The MaD framework is designed to be generic in the sense that it is a Python wrapper which in principle can be integrated with any simulator tool for neuromorphic chips.
NEFeb 15, 2019
Deep Spiking Neural Network with Spike Count based Learning RuleJibin Wu, Yansong Chua, Malu Zhang et al.
Deep spiking neural networks (SNNs) support asynchronous event-driven computation, massive parallelism and demonstrate great potential to improve the energy efficiency of its synchronous analog counterpart. However, insufficient attention has been paid to neural encoding when designing SNN learning rules. Remarkably, the temporal credit assignment has been performed on rate-coded spiking inputs, leading to poor learning efficiency. In this paper, we introduce a novel spike-based learning rule for rate-coded deep SNNs, whereby the spike count of each neuron is used as a surrogate for gradient backpropagation. We evaluate the proposed learning rule by training deep spiking multi-layer perceptron (MLP) and spiking convolutional neural network (CNN) on the UCI machine learning and MNIST handwritten digit datasets. We show that the proposed learning rule achieves state-of-the-art accuracies on all benchmark datasets. The proposed learning rule allows introducing latency, spike rate and hardware constraints into the SNN learning, which is superior to the indirect approach in which conventional artificial neural networks are first trained and then converted to SNNs. Hence, it allows direct deployment to the neuromorphic hardware and supports efficient inference. Notably, a test accuracy of 98.40% was achieved on the MNIST dataset in our experiments with only 10 simulation time steps, when the same latency constraint is imposed during training.
NEJan 1, 2019
MaD: Mapping and debugging framework for implementing deep neural network onto a neuromorphic chip with crossbar array of synapsesRoshan Gopalakrishnan, Ashish Jith Sreejith Kumar, Yansong Chua
Neuromorphic systems or dedicated hardware for neuromorphic computing is getting popular with the advancement in research on different device materials for synapses, especially in crossbar architecture and also algorithms specific or compatible to neuromorphic hardware. Hence, an automated mapping of any deep neural network onto the neuromorphic chip with crossbar array of synapses and an efficient debugging framework is very essential. Here, mapping is defined as the deployment of a section of deep neural network layer onto a neuromorphic core and the generation of connection lists among population of neurons to specify the connectivity between various neuromorphic cores on the neuromorphic chip. Debugging is the verification of computations performed on the neuromorphic chip during inferencing. Together the framework becomes Mapping and Debugging (MaD) framework. MaD framework is quite general in usage as it is a Python wrapper which can be integrated with almost every simulator tools for neuromorphic chips. This paper illustrates the MaD framework in detail, considering some optimizations while mapping onto a single neuromorphic core. A classification task on MNIST and CIFAR-10 datasets are considered for test case implementation of MaD framework.
NEJul 3, 2018
Is Neuromorphic MNIST neuromorphic? Analyzing the discriminative power of neuromorphic datasets in the time domainLaxmi R. Iyer, Yansong Chua, Haizhou Li
The advantage of spiking neural networks (SNNs) over their predecessors is their ability to spike, enabling them to use spike timing for coding and efficient computing. A neuromorphic dataset should allow a neuromorphic algorithm to clearly show that a SNN is able to perform better on the dataset than an ANN. We have analyzed both N-MNIST and N-Caltech101 along these lines, but focus our study on N-MNIST. First we evaluate if additional information is encoded in the time domain in a neuromoprhic dataset. We show that an ANN trained with backpropagation on frame based versions of N-MNIST and N-Caltech101 images achieve 99.23% and 78.01% accuracy. These are the best classification accuracies obtained on these datasets to date. Second we present the first unsupervised SNN to be trained on N-MNIST and demonstrate results of 91.78%. We also use this SNN for further experiments on N-MNIST to show that rate based SNNs perform better, and precise spike timings are not important in N-MNIST. N-MNIST does not, therefore, highlight the unique ability of SNNs. The conclusion of this study opens an important question in neuromorphic engineering - what, then, constitutes a good neuromorphic dataset?
CVJul 2, 2018
Classifying neuromorphic data using a deep learning framework for image classificationRoshan Gopalakrishnan, Yansong Chua, Laxmi R Iyer
In the field of artificial intelligence, neuromorphic computing has been around for several decades. Deep learning has however made much recent progress such that it consistently outperforms neuromorphic learning algorithms in classification tasks in terms of accuracy. Specifically in the field of image classification, neuromorphic computing has been traditionally using either the temporal or rate code for encoding static images in datasets into spike trains. It is only till recently, that neuromorphic vision sensors are widely used by the neuromorphic research community, and provides an alternative to such encoding methods. Since then, several neuromorphic datasets as obtained by applying such sensors on image datasets (e.g. the neuromorphic CALTECH 101) have been introduced. These data are encoded in spike trains and hence seem ideal for benchmarking of neuromorphic learning algorithms. Specifically, we train a deep learning framework used for image classification on the CALTECH 101 and a collapsed version of the neuromorphic CALTECH 101 datasets. We obtained an accuracy of 91.66% and 78.01% for the CALTECH 101 and neuromorphic CALTECH 101 datasets respectively. For CALTECH 101, our accuracy is close to the best reported accuracy, while for neuromorphic CALTECH 101, it outperforms the last best reported accuracy by over 10%. This raises the question of the suitability of such datasets as benchmarks for neuromorphic learning algorithms.