LGJan 31, 2023
Tricking AI chips into Simulating the Human Brain: A Detailed Performance AnalysisLennart P. L. Landsmeer, Max C. W. Engelen, Rene Miedema et al.
Challenging the Nvidia monopoly, dedicated AI-accelerator chips have begun emerging for tackling the computational challenge that the inference and, especially, the training of modern deep neural networks (DNNs) poses to modern computers. The field has been ridden with studies assessing the performance of these contestants across various DNN model types. However, AI-experts are aware of the limitations of current DNNs and have been working towards the fourth AI wave which will, arguably, rely on more biologically inspired models, predominantly on spiking neural networks (SNNs). At the same time, GPUs have been heavily used for simulating such models in the field of computational neuroscience, yet AI-chips have not been tested on such workloads. The current paper aims at filling this important gap by evaluating multiple, cutting-edge AI-chips (Graphcore IPU, GroqChip, Nvidia GPU with Tensor Cores and Google TPU) on simulating a highly biologically detailed model of a brain region, the inferior olive (IO). This IO application stress-tests the different AI-platforms for highlighting architectural tradeoffs by varying its compute density, memory requirements and floating-point numerical accuracy. Our performance analysis reveals that the simulation problem maps extremely well onto the GPU and TPU architectures, which for networks of 125,000 cells leads to a 28x respectively 1,208x speedup over CPU runtimes. At this speed, the TPU sets a new record for largest real-time IO simulation. The GroqChip outperforms both platforms for small networks but, due to implementing some floating-point operations at reduced accuracy, is found not yet usable for brain simulation.
ARNov 8, 2023
A Lightweight Architecture for Real-Time Neuronal-Spike ClassificationMuhammad Ali Siddiqi, David Vrijenhoek, Lennart P. L. Landsmeer et al.
Electrophysiological recordings of neural activity in a mouse's brain are very popular among neuroscientists for understanding brain function. One particular area of interest is acquiring recordings from the Purkinje cells in the cerebellum in order to understand brain injuries and the loss of motor functions. However, current setups for such experiments do not allow the mouse to move freely and, thus, do not capture its natural behaviour since they have a wired connection between the animal's head stage and an acquisition device. In this work, we propose a lightweight neuronal-spike detection and classification architecture that leverages on the unique characteristics of the Purkinje cells to discard unneeded information from the sparse neural data in real time. This allows the (condensed) data to be easily stored on a removable storage device on the head stage, alleviating the need for wires. Synthesis results reveal a >95% overall classification accuracy while still resulting in a small-form-factor design, which allows for the free movement of mice during experiments. Moreover, the power-efficient nature of the design and the usage of STT-RAM (Spin Transfer Torque Magnetic Random Access Memory) as the removable storage allows the head stage to easily operate on a tiny battery for up to approximately 4 days.
LGOct 14, 2020Code
Privacy-Preserving Object Detection & Localization Using Distributed Machine Learning: A Case Study of Infant Eyeblink ConditioningStefan Zwaard, Henk-Jan Boele, Hani Alers et al.
Distributed machine learning is becoming a popular model-training method due to privacy, computational scalability, and bandwidth capacities. In this work, we explore scalable distributed-training versions of two algorithms commonly used in object detection. A novel distributed training algorithm using Mean Weight Matrix Aggregation (MWMA) is proposed for Linear Support Vector Machine (L-SVM) object detection based in Histogram of Orientated Gradients (HOG). In addition, a novel Weighted Bin Aggregation (WBA) algorithm is proposed for distributed training of Ensemble of Regression Trees (ERT) landmark localization. Both algorithms do not restrict the location of model aggregation and allow custom architectures for model distribution. For this work, a Pool-Based Local Training and Aggregation (PBLTA) architecture for both algorithms is explored. The application of both algorithms in the medical field is examined using a paradigm from the fields of psychology and neuroscience - eyeblink conditioning with infants - where models need to be trained on facial images while protecting participant privacy. Using distributed learning, models can be trained without sending image data to other nodes. The custom software has been made available for public use on GitHub: https://github.com/SLWZwaard/DMT. Results show that the aggregation of models for the HOG algorithm using MWMA not only preserves the accuracy of the model but also allows for distributed learning with an accuracy increase of 0.9% compared with traditional learning. Furthermore, WBA allows for ERT model aggregation with an accuracy increase of 8% when compared to single-node models.
CRJan 17, 2022
Improving the Security of the IEEE 802.15.6 Standard for Medical BANsMuhammad Ali Siddiqi, Georg Hahn, Said Hamdioui et al.
A Medical Body Area Network (MBAN) is an ensemble of collaborating, potentially heterogeneous, medical devices located inside, on the surface of or around the human body with the objective of tackling one or multiple medical conditions of the MBAN host. These devices -- which are a special category of Wireless Body Area Networks (WBANs) -- collect, process and transfer medical data outside of the network, while in some cases they also administer medical treatment autonomously. Since communication is so pivotal to their operation, the newfangled IEEE 802.15.6 standard is aimed at the communication aspects of WBANs. It places a set of physical and communication constraints while it also includes association/disassociation protocols and security services that WBAN applications need to comply with. However, the security specifications put forward by the standard can be easily shown to be insufficient when considering realistic MBAN use cases and need further enhancements. The present work addresses these shortcomings by, first, providing a structured analysis of the IEEE 802.15.6 security features and, afterwards, proposing comprehensive and tangible recommendations on improving the standard's security.
CVJun 1, 2020
Real-Time Face and Landmark Localization for Eyeblink DetectionPaul Bakker, Henk-Jan Boele, Zaid Al-Ars et al.
Pavlovian eyeblink conditioning is a powerful experiment used in the field of neuroscience to measure multiple aspects of how we learn in our daily life. To track the movement of the eyelid during an experiment, researchers have traditionally made use of potentiometers or electromyography. More recently, the use of computer vision and image processing alleviated the need for these techniques but currently employed methods require human intervention and are not fast enough to enable real-time processing. In this work, a face- and landmark-detection algorithm have been carefully combined in order to provide fully automated eyelid tracking, and have further been accelerated to make the first crucial step towards online, closed-loop experiments. Such experiments have not been achieved so far and are expected to offer significant insights in the workings of neurological and psychiatric disorders. Based on an extensive literature search, various different algorithms for face detection and landmark detection have been analyzed and evaluated. Two algorithms were identified as most suitable for eyelid detection: the Histogram-of-Oriented-Gradients (HOG) algorithm for face detection and the Ensemble-of-Regression-Trees (ERT) algorithm for landmark detection. These two algorithms have been accelerated on GPU and CPU, achieving speedups of 1,753$\times$ and 11$\times$, respectively. To demonstrate the usefulness of our eyelid-detection algorithm, a research hypothesis was formed and a well-established neuroscientific experiment was employed: eyeblink detection. Our experimental evaluation reveals an overall application runtime of 0.533 ms per frame, which is 1,101$\times$ faster than the sequential implementation and well within the real-time requirements of eyeblink conditioning in humans, i.e. faster than 500 frames per second.
CRFeb 21, 2020
IMDfence: Architecting a Secure Protocol for Implantable Medical DevicesMuhammad Ali Siddiqi, Christian Doerr, Christos Strydis
Over the past decade, focus on the security and privacy aspects of implantable medical devices (IMDs) has intensified, driven by the multitude of cybersecurity vulnerabilities found in various existing devices. However, due to their strict computational, energy and physical constraints, conventional security protocols are not directly applicable to IMDs. Custom-tailored schemes have been proposed instead which, however, fail to cover the full spectrum of security features that modern IMDs and their ecosystems so critically require. In this paper we propose IMDfence, a security protocol for IMD ecosystems that provides a comprehensive yet practical security portfolio, which includes availability, non-repudiation, access control, entity authentication, remote monitoring and system scalability. The protocol also allows emergency access that results in the graceful degradation of offered services without compromising security and patient safety. The performance of the security protocol as well as its feasibility and impact on modern IMDs are extensively analyzed and evaluated. We find that IMDfence achieves the above security requirements at a mere less than 7% increase in total IMD energy consumption, and less than 14 ms and 9 kB increase in system delay and memory footprint, respectively.
CRApr 15, 2019
Towards Realistic Battery-DoS Protection of Implantable Medical DevicesMuhammad Ali Siddiqi, Christos Strydis
Modern Implantable Medical Devices (IMDs) feature wireless connectivity, which makes them vulnerable to security attacks. Particular to IMDs is the battery Denial-of-Service attack whereby attackers aim to fully deplete the battery by occupying the IMD with continuous authentication requests. Zero-Power Defense (ZPD) based on energy harvesting is known to be an excellent protection against these attacks. This paper establishes essential design specifications for employing ZPD techniques in IMDs, offers a critical review of ZPD techniques found in literature and, subsequently, gives crucial recommendations for developing comprehensive ZPD solutions.
CRApr 15, 2019
IMD Security vs. Energy: Are We Tilting at Windmills?: POSTERMuhammad Ali Siddiqi, Christos Strydis
Implantable Medical Devices (IMDs) such as pacemakers and neurostimulators are highly constrained in terms of energy. In addition, the wireless-communication facilities of these devices also impose security requirements considering their life-critical nature. However, security solutions that provide considerable coverage are generally considered to be too taxing on an IMD battery. Consequently, there has been a tendency to adopt ultra-lightweight security primitives for IMDs in literature. In this work, we demonstrate that the recent advances in embedded computing in fact enable the IMDs to use more mainstream security primitives, which do not need to compromise significantly on security for fear of impacting IMD autonomy.
NEDec 5, 2016
BrainFrame: A node-level heterogeneous accelerator platform for neuron simulationsGeorgios Smaragdos, Georgios Chatzikonstantis, Rahul Kukreja et al.
Objective: The advent of High-Performance Computing (HPC) in recent years has led to its increasing use in brain study through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a single acceleration (or homogeneous) platform to effectively address the complete array of modeling requirements. Approach: In this paper we propose and build BrainFrame, a heterogeneous acceleration platform, incorporating three distinct acceleration technologies, a Dataflow Engine, a Xeon Phi and a GP-GPU. The PyNN framework is also integrated into the platform. As a challenging proof of concept, we analyze the performance of BrainFrame on different instances of a state-of-the-art neuron model, modeling the Inferior- Olivary Nucleus using a biophysically-meaningful, extended Hodgkin-Huxley representation. The model instances take into account not only the neuronal- network dimensions but also different network-connectivity circumstances that can drastically change application workload characteristics. Main results: The synthetic approach of three HPC technologies demonstrated that BrainFrame is better able to cope with the modeling diversity encountered. Our performance analysis shows clearly that the model directly affect performance and all three technologies are required to cope with all the model use cases.