NECVLGIVMLMay 24, 2019

Synthesizing Images from Spatio-Temporal Representations using Spike-based Backpropagation

arXiv:1906.08861v126 citations
Originality Incremental advance
AI Analysis

This work addresses the need for low-power neuromorphic hardware applications by enabling cross-modal image synthesis, though it is incremental as it adapts existing methods to spiking neural networks.

The paper tackles the problem of synthesizing images from audio in a spike-based environment by using spiking auto-encoders to create shared spatio-temporal representations, achieving competitive performance against artificial neural networks on tasks like converting TI-46 audio samples to MNIST images.

Spiking neural networks (SNNs) offer a promising alternative to current artificial neural networks to enable low-power event-driven neuromorphic hardware. Spike-based neuromorphic applications require processing and extracting meaningful information from spatio-temporal data, represented as series of spike trains over time. In this paper, we propose a method to synthesize images from multiple modalities in a spike-based environment. We use spiking auto-encoders to convert image and audio inputs into compact spatio-temporal representations that is then decoded for image synthesis. For this, we use a direct training algorithm that computes loss on the membrane potential of the output layer and back-propagates it by using a sigmoid approximation of the neuron's activation function to enable differentiability. The spiking autoencoders are benchmarked on MNIST and Fashion-MNIST and achieve very low reconstruction loss, comparable to ANNs. Then, spiking autoencoders are trained to learn meaningful spatio-temporal representations of the data, across the two modalities - audio and visual. We synthesize images from audio in a spike-based environment by first generating, and then utilizing such shared multi-modal spatio-temporal representations. Our audio to image synthesis model is tested on the task of converting TI-46 digits audio samples to MNIST images. We are able to synthesize images with high fidelity and the model achieves competitive performance against ANNs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes