NEMar 3, 2022
Rethinking the role of normalization and residual blocks for spiking neural networksShin-ichi Ikegawa, Ryuji Saiin, Yoshihide Sawada et al.
Biologically inspired spiking neural networks (SNNs) are widely used to realize ultralow-power energy consumption. However, deep SNNs are not easy to train due to the excessive firing of spiking neurons in the hidden layers. To tackle this problem, we propose a novel but simple normalization technique called postsynaptic potential normalization. This normalization removes the subtraction term from the standard normalization and uses the second raw moment instead of the variance as the division term. The spike firing can be controlled, enabling the training to proceed appropriating, by conducting this simple normalization to the postsynaptic potential. The experimental results show that SNNs with our normalization outperformed other models using other normalizations. Furthermore, through the pre-activation residual blocks, the proposed model can train with more than 100 layers without other special techniques dedicated to SNNs.
CVJun 20, 2022
C-SENN: Contrastive Self-Explaining Neural NetworkYoshihide Sawada, Keigo Nakamura
In this study, we use a self-explaining neural network (SENN), which learns unsupervised concepts, to acquire concepts that are easy for people to understand automatically. In concept learning, the hidden layer retains verbalizable features relevant to the output, which is crucial when adapting to real-world environments where explanations are required. However, it is known that the interpretability of concepts output by SENN is reduced in general settings, such as autonomous driving scenarios. Thus, this study combines contrastive learning with concept learning to improve the readability of concepts and the accuracy of tasks. We call this model Contrastive Self-Explaining Neural Network (C-SENN).
MLMar 16, 2023
Bayesian Generalization Error in Linear Neural Networks with Concept Bottleneck Structure and Multitask FormulationNaoki Hayashi, Yoshihide Sawada
Concept bottleneck model (CBM) is a ubiquitous method that can interpret neural networks using concepts. In CBM, concepts are inserted between the output layer and the last intermediate layer as observable values. This helps in understanding the reason behind the outputs generated by the neural networks: the weights corresponding to the concepts from the last hidden layer to the output layer. However, it has not yet been possible to understand the behavior of the generalization error in CBM since a neural network is a singular statistical model in general. When the model is singular, a one to one map from the parameters to probability distributions cannot be created. This non-identifiability makes it difficult to analyze the generalization performance. In this study, we mathematically clarify the Bayesian generalization error and free energy of CBM when its architecture is three-layered linear neural networks. We also consider a multitask problem where the neural network outputs not only the original output but also the concepts. The results show that CBM drastically changes the behavior of the parameter region and the Bayesian generalization error in three-layered linear neural networks as compared with the standard version, whereas the multitask formulation does not.
NEOct 4, 2023
Spike Accumulation Forwarding for Effective Training of Spiking Neural NetworksRyuji Saiin, Tomoya Shirakawa, Sota Yoshihara et al.
In this article, we propose a new paradigm for training spiking neural networks (SNNs), spike accumulation forwarding (SAF). It is known that SNNs are energy-efficient but difficult to train. Consequently, many researchers have proposed various methods to solve this problem, among which online training through time (OTTT) is a method that allows inferring at each time step while suppressing the memory cost. However, to compute efficiently on GPUs, OTTT requires operations with spike trains and weighted summation of spike trains during forwarding. In addition, OTTT has shown a relationship with the Spike Representation, an alternative training method, though theoretical agreement with Spike Representation has yet to be proven. Our proposed method can solve these problems; namely, SAF can halve the number of operations during the forward process, and it can be theoretically proven that SAF is consistent with the Spike Representation and OTTT, respectively. Furthermore, we confirmed the above contents through experiments and showed that it is possible to reduce memory and training time while maintaining accuracy.
LGFeb 3, 2023
Spiking Synaptic Penalty: Appropriate Penalty Term for Energy-Efficient Spiking Neural NetworksKazuma Suetake, Takuya Ushimaru, Ryuji Saiin et al.
Spiking neural networks (SNNs) are energy-efficient neural networks because of their spiking nature. However, as the spike firing rate of SNNs increases, the energy consumption does as well, and thus, the advantage of SNNs diminishes. Here, we tackle this problem by introducing a novel penalty term for the spiking activity into the objective function in the training phase. Our method is designed so as to optimize the energy consumption metric directly without modifying the network architecture. Therefore, the proposed method can reduce the energy consumption more than other methods while maintaining the accuracy. We conducted experiments for image classification tasks, and the results indicate the effectiveness of the proposed method, which mitigates the dilemma of the energy--accuracy trade-off.
LGMar 22, 2024
Magic for the Age of Quantized DNNsYoshihide Sawada, Ryuji Saiin, Kazuma Suetake
Recently, the number of parameters in DNNs has explosively increased, as exemplified by LLMs (Large Language Models), making inference on small-scale computers more difficult. Model compression technology is, therefore, essential for integration into products. In this paper, we propose a method of quantization-aware training. We introduce a novel normalization (Layer-Batch Normalization) that is independent of the mini-batch size and does not require any additional computation cost during inference. Then, we quantize the weights by the scaled round-clip function with the weight standardization. We also quantize activation functions using the same function and apply surrogate gradients to train the model with both quantized weights and the quantized activation functions. We call this method Magic for the age of Quantised DNNs (MaQD). Experimental results show that our quantization method can be achieved with minimal accuracy degradation.
MLMar 14, 2024
Upper Bound of Bayesian Generalization Error in Partial Concept Bottleneck Model (CBM): Partial CBM outperforms naive CBMNaoki Hayashi, Yoshihide Sawada
Concept Bottleneck Model (CBM) is a methods for explaining neural networks. In CBM, concepts which correspond to reasons of outputs are inserted in the last intermediate layer as observed values. It is expected that we can interpret the relationship between the output and concept similar to linear regression. However, this interpretation requires observing all concepts and decreases the generalization performance of neural networks. Partial CBM (PCBM), which uses partially observed concepts, has been devised to resolve these difficulties. Although some numerical experiments suggest that the generalization performance of PCBMs is almost as high as that of the original neural networks, the theoretical behavior of its generalization error has not been yet clarified since PCBM is singular statistical model. In this paper, we reveal the Bayesian generalization error in PCBM with a three-layered and linear architecture. The result indcates that the structure of partially observed concepts decreases the Bayesian generalization error compared with that of CBM (full-observed concepts).
LGFeb 16, 2024
Can Transformers Predict Vibrations?Fusataka Kuniyoshi, Yoshihide Sawada
Highly accurate time-series vibration prediction is an important research issue for electric vehicles (EVs). EVs often experience vibrations when driving on rough terrains, known as torsional resonance. This resonance, caused by the interaction between motor and tire vibrations, puts excessive loads on the vehicle's drive shaft. However, current damping technologies only detect resonance after the vibration amplitude of the drive shaft torque reaches a certain threshold, leading to significant loads on the shaft at the time of detection. In this study, we propose a novel approach to address this issue by introducing Resoformer, a transformer-based model for predicting torsional resonance. Resoformer utilizes time-series of the motor rotation speed as input and predicts the amplitude of torsional vibration at a specified quantile occurring in the shaft after the input series. By calculating the attention between recursive and convolutional features extracted from the measured data points, Resoformer improves the accuracy of vibration forecasting. To evaluate the model, we use a vibration dataset called VIBES (Dataset for Forecasting Vibration Transition in EVs), consisting of 2,600 simulator-generated vibration sequences. Our experiments, conducted on strong baselines built on the VIBES dataset, demonstrate that Resoformer achieves state-of-the-art results. In conclusion, our study answers the question "Can Transformers Forecast Vibrations?" While traditional transformer architectures show low performance in forecasting torsional resonance waves, our findings indicate that combining recurrent neural network and temporal convolutional network using the transformer architecture improves the accuracy of long-term vibration forecasting.
CVFeb 3, 2022
Concept Bottleneck Model with Additional Unsupervised ConceptsYoshihide Sawada, Keigo Nakamura
With the increasing demands for accountability, interpretability is becoming an essential capability for real-world AI applications. However, most methods utilize post-hoc approaches rather than training the interpretable model. In this article, we propose a novel interpretable model based on the concept bottleneck model (CBM). CBM uses concept labels to train an intermediate layer as the additional visible layer. However, because the number of concept labels restricts the dimension of this layer, it is difficult to obtain high accuracy with a small number of labels. To address this issue, we integrate supervised concepts with unsupervised ones trained with self-explaining neural networks (SENNs). By seamlessly training these two types of concepts while reducing the amount of computation, we can obtain both supervised and unsupervised concepts simultaneously, even for large-sized images. We refer to the proposed model as the concept bottleneck model with additional unsupervised concepts (CBM-AUC). We experimentally confirmed that the proposed model outperformed CBM and SENN. We also visualized the saliency map of each concept and confirmed that it was consistent with the semantic meanings.
LGJan 26, 2022
S$^3$NN: Time Step Reduction of Spiking Surrogate Gradients for Training Energy Efficient Single-Step Spiking Neural NetworksKazuma Suetake, Shin-ichi Ikegawa, Ryuji Saiin et al.
As the scales of neural networks increase, techniques that enable them to run with low computational cost and energy efficiency are required. From such demands, various efficient neural network paradigms, such as spiking neural networks (SNNs) or binary neural networks (BNNs), have been proposed. However, they have sticky drawbacks, such as degraded inference accuracy and latency. To solve these problems, we propose a single-step spiking neural network (S$^3$NN), an energy-efficient neural network with low computational cost and high precision. The proposed S$^3$NN processes the information between hidden layers by spikes as SNNs. Nevertheless, it has no temporal dimension so that there is no latency within training and inference phases as BNNs. Thus, the proposed S$^3$NN has a lower computational cost than SNNs that require time-series processing. However, S$^3$NN cannot adopt naïve backpropagation algorithms due to the non-differentiability nature of spikes. We deduce a suitable neuron model by reducing the surrogate gradient for multi-time step SNNs to a single-time step. We experimentally demonstrated that the obtained surrogate gradient allows S$^3$NN to be trained appropriately. We also showed that the proposed S$^3$NN could achieve comparable accuracy to full-precision networks while being highly energy-efficient.
LGOct 25, 2019
Study of Deep Generative Models for Inorganic Chemical CompositionsYoshihide Sawada, Koji Morikawa, Mikiya Fujii
Generative models based on generative adversarial networks (GANs) and variational autoencoders (VAEs) have been widely studied in the fields of image generation, speech generation, and drug discovery, but, only a few studies have focused on the generation of inorganic materials. Such studies use the crystal structures of materials, but material researchers rarely store this information. Thus, we generate chemical compositions without using crystal information. We use a conditional VAE (CondVAE) and a conditional GAN (CondGAN) and show that CondGAN using the bag-of-atom representation with physical descriptors generates better compositions than other generative models. Also, we evaluate the effectiveness of the Metropolis-Hastings-based atomic valency modification and the extrapolation performance, which is important to material discovery.
LGSep 3, 2019
Data-Driven Approach to Encoding and Decoding 3-D Crystal StructuresJordan Hoffmann, Louis Maestrati, Yoshihide Sawada et al.
Generative models have achieved impressive results in many domains including image and text generation. In the natural sciences, generative models have led to rapid progress in automated drug discovery. Many of the current methods focus on either 1-D or 2-D representations of typically small, drug-like molecules. However, many molecules require 3-D descriptors and exceed the chemical complexity of commonly used dataset. We present a method to encode and decode the position of atoms in 3-D molecules from a dataset of nearly 50,000 stable crystal unit cells that vary from containing 1 to over 100 atoms. We construct a smooth and continuous 3-D density representation of each crystal based on the positions of different atoms. Two different neural networks were trained on a dataset of over 120,000 three-dimensional samples of single and repeating crystal structures, made by rotating the single unit cells. The first, an Encoder-Decoder pair, constructs a compressed latent space representation of each molecule and then decodes this description into an accurate reconstruction of the input. The second network segments the resulting output into atoms and assigns each atom an atomic number. By generating compressed, continuous latent spaces representations of molecules we are able to decode random samples, interpolate between two molecules, and alter known molecules.
CVApr 19, 2018
Disentangling Controllable and Uncontrollable Factors of Variation by Interacting with the WorldYoshihide Sawada
We introduce a method to disentangle controllable and uncontrollable factors of variation by interacting with the world. Disentanglement leads to good representations and is important when applying deep neural networks (DNNs) in fields where explanations are required. This study attempts to improve an existing reinforcement learning (RL) approach to disentangle controllable and uncontrollable factors of variation, because the method lacks a mechanism to represent uncontrollable obstacles. To address this problem, we train two DNNs simultaneously: one that represents the controllable object and another that represents uncontrollable obstacles. For stable training, we applied a pretraining approach using a model robust against uncontrollable obstacles. Simulation experiments demonstrate that the proposed model can disentangle independently controllable and uncontrollable factors without annotated data.
CVNov 13, 2017
All-Transfer Learning for Deep Neural Networks and its Application to Sepsis ClassificationYoshihide Sawada, Yoshikuni Sato, Toru Nakada et al.
In this article, we propose a transfer learning method for deep neural networks (DNNs). Deep learning has been widely used in many applications. However, applying deep learning is problematic when a large amount of training data are not available. One of the conventional methods for solving this problem is transfer learning for DNNs. In the field of image recognition, state-of-the-art transfer learning methods for DNNs re-use parameters trained on source domain data except for the output layer. However, this method may result in poor classification performance when the amount of target domain data is significantly small. To address this problem, we propose a method called All-Transfer Deep Learning, which enables the transfer of all parameters of a DNN. With this method, we can compute the relationship between the source and target labels by the source domain knowledge. We applied our method to actual two-dimensional electrophoresis image~(2-DE image) classification for determining if an individual suffers from sepsis; the first attempt to apply a classification approach to 2-DE images for proteomics, which has attracted considerable attention as an extension beyond genomics. The results suggest that our proposed method outperforms conventional transfer learning methods for DNNs.