SDSep 6, 2016
Discriminative Enhancement for Single Channel Audio Source Separation using Deep Neural NetworksEmad M. Grais, Gerard Roma, Andrew J. R. Simpson et al.
The sources separated by most single channel audio source separation techniques are usually distorted and each separated source contains residual signals from the other sources. To tackle this problem, we propose to enhance the separated sources to decrease the distortion and interference between the separated sources using deep neural networks (DNNs). Two different DNNs are used in this work. The first DNN is used to separate the sources from the mixed signal. The second DNN is used to enhance the separated signals. To consider the interactions between the separated sources, we propose to use a single DNN to enhance all the separated sources together. To reduce the residual signals of one source from the other separated sources (interference), we train the DNN for enhancement discriminatively to maximize the dissimilarity between the predicted sources. The experimental results show that using discriminative enhancement decreases the distortion and interference between the separated sources.
LGFeb 25, 2016
Hierarchical Conflict Propagation: Sequence Learning in a Recurrent Deep Neural NetworkAndrew J. R. Simpson
Recurrent neural networks (RNN) are capable of learning to encode and exploit activation history over an arbitrary timescale. However, in practice, state of the art gradient descent based training methods are known to suffer from difficulties in learning long term dependencies. Here, we describe a novel training method that involves concurrent parallel cloned networks, each sharing the same weights, each trained at different stimulus phase and each maintaining independent activation histories. Training proceeds by recursively performing batch-updates over the parallel clones as activation history is progressively increased. This allows conflicts to propagate hierarchically from short-term contexts towards longer-term contexts until they are resolved. We illustrate the parallel clones method and hierarchical conflict propagation with a character-level deep RNN tasked with memorizing a paragraph of Moby Dick (by Herman Melville).
NEOct 19, 2015
Qualitative Projection Using Deep Neural NetworksAndrew J. R. Simpson
Deep neural networks (DNN) abstract by demodulating the output of linear filters. In this article, we refine this definition of abstraction to show that the inputs of a DNN are abstracted with respect to the filters. Or, to restate, the abstraction is qualified by the filters. This leads us to introduce the notion of qualitative projection. We use qualitative projection to abstract MNIST hand-written digits with respect to the various dogs, horses, planes and cars of the CIFAR dataset. We then classify the MNIST digits according to the magnitude of their dogness, horseness, planeness and carness qualities, illustrating the generality of qualitative projection.
LGOct 8, 2015
Uniform Learning in a Deep Neural Network via "Oddball" Stochastic Gradient DescentAndrew J. R. Simpson
When training deep neural networks, it is typically assumed that the training examples are uniformly difficult to learn. Or, to restate, it is assumed that the training error will be uniformly distributed across the training examples. Based on these assumptions, each training example is used an equal number of times. However, this assumption may not be valid in many cases. "Oddball SGD" (novelty-driven stochastic gradient descent) was recently introduced to drive training probabilistically according to the error distribution - training frequency is proportional to training error magnitude. In this article, using a deep neural network to encode a video, we show that oddball SGD can be used to enforce uniform error across the training set.
LGSep 18, 2015
"Oddball SGD": Novelty Driven Stochastic Gradient Descent for Training Deep Neural NetworksAndrew J. R. Simpson
Stochastic Gradient Descent (SGD) is arguably the most popular of the machine learning methods applied to training deep neural networks (DNN) today. It has recently been demonstrated that SGD can be statistically biased so that certain elements of the training set are learned more rapidly than others. In this article, we place SGD into a feedback loop whereby the probability of selection is proportional to error magnitude. This provides a novelty-driven oddball SGD process that learns more rapidly than traditional SGD by prioritising those elements of the training set with the largest novelty (error). In our DNN example, oddball SGD trains some 50x faster than regular SGD.
LGSep 17, 2015
Taming the ReLU with Parallel Dither in a Deep Neural NetworkAndrew J. R. Simpson
Rectified Linear Units (ReLU) seem to have displaced traditional 'smooth' nonlinearities as activation-function-du-jour in many - but not all - deep neural network (DNN) applications. However, nobody seems to know why. In this article, we argue that ReLU are useful because they are ideal demodulators - this helps them perform fast abstract learning. However, this fast learning comes at the expense of serious nonlinear distortion products - decoy features. We show that Parallel Dither acts to suppress the decoy features, preventing overfitting and leaving the true features cleanly demodulated for rapid, reliable learning.
LGSep 10, 2015
Use it or Lose it: Selective Memory and Forgetting in a Perpetual Learning MachineAndrew J. R. Simpson
In a recent article we described a new type of deep neural network - a Perpetual Learning Machine (PLM) - which is capable of learning 'on the fly' like a brain by existing in a state of Perpetual Stochastic Gradient Descent (PSGD). Here, by simulating the process of practice, we demonstrate both selective memory and selective forgetting when we introduce statistical recall biases during PSGD. Frequently recalled memories are remembered, whilst memories recalled rarely are forgotten. This results in a 'use it or lose it' stimulus driven memory process that is similar to human memory.
LGSep 3, 2015
On-the-Fly Learning in a Perpetual Learning MachineAndrew J. R. Simpson
Despite the promise of brain-inspired machine learning, deep neural networks (DNN) have frustratingly failed to bridge the deceptively large gap between learning and memory. Here, we introduce a Perpetual Learning Machine; a new type of DNN that is capable of brain-like dynamic 'on the fly' learning because it exists in a self-supervised state of Perpetual Stochastic Gradient Descent. Thus, we provide the means to unify learning and memory within a machine learning framework. We also explore the elegant duality of abstraction and synthesis: the Yin and Yang of deep learning.
LGAug 28, 2015
Parallel Dither and Dropout for Regularising Deep Neural NetworksAndrew J. R. Simpson
Effective regularisation during training can mean the difference between success and failure for deep neural networks. Recently, dither has been suggested as alternative to dropout for regularisation during batch-averaged stochastic gradient descent (SGD). In this article, we show that these methods fail without batch averaging and we introduce a new, parallel regularisation method that may be used without batch averaging. Our results for parallel-regularised non-batch-SGD are substantially better than what is possible with batch-SGD. Furthermore, our results demonstrate that dither and dropout are complimentary.
LGAug 19, 2015
Dither is Better than Dropout for Regularising Deep Neural NetworksAndrew J. R. Simpson
Regularisation of deep neural networks (DNN) during training is critical to performance. By far the most popular method is known as dropout. Here, cast through the prism of signal processing theory, we compare and contrast the regularisation effects of dropout with those of dither. We illustrate some serious inherent limitations of dropout and demonstrate that dither provides a more effective regulariser.
LGMay 22, 2015
Instant Learning: Parallel Deep Neural Networks and Convolutional BootstrappingAndrew J. R. Simpson
Although deep neural networks (DNN) are able to scale with direct advances in computational power (e.g., memory and processing speed), they are not well suited to exploit the recent trends for parallel architectures. In particular, gradient descent is a sequential process and the resulting serial dependencies mean that DNN training cannot be parallelized effectively. Here, we show that a DNN may be replicated over a massive parallel architecture and used to provide a cumulative sampling of local solution space which results in rapid and robust learning. We introduce a complimentary convolutional bootstrapping approach that enhances performance of the parallel architecture further. Our parallelized convolutional bootstrapping DNN out-performs an identical fully-trained traditional DNN after only a single iteration of training.
SDApr 28, 2015
Time-Frequency Trade-offs for Audio Source Separation with Binary MasksAndrew J. R. Simpson
The short-time Fourier transform (STFT) provides the foundation of binary-mask based audio source separation approaches. In computing a spectrogram, the STFT window size parameterizes the trade-off between time and frequency resolution. However, it is not yet known how this parameter affects the operation of the binary mask in terms of separation quality for real-world signals such as speech or music. Here, we demonstrate that the trade-off between time and frequency in the STFT, used to perform ideal binary mask separation, depends upon the types of source that are to be separated. In particular, we demonstrate that different window sizes are optimal for separating different combinations of speech and musical signals. Our findings have broad implications for machine audition and machine learning in general.
SDApr 17, 2015
Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural NetworkAndrew J. R. Simpson, Gerard Roma, Mark D. Plumbley
Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of voice and non-voice in the context of musical mixtures. Here, we trained a convolutional DNN (of around a billion parameters) to provide probabilistic estimates of the ideal binary mask for separation of vocal sounds from real-world musical mixtures. We contrast our DNN results with more traditional linear methods. Our approach may be useful for automatic removal of vocal sounds from musical mixtures for 'karaoke' type applications.
SDApr 12, 2015
Deep Transform: Cocktail Party Source Separation via Complex Convolution in a Deep Neural NetworkAndrew J. R. Simpson
Convolutional deep neural networks (DNN) are state of the art in many engineering problems but have not yet addressed the issue of how to deal with complex spectrograms. Here, we use circular statistics to provide a convenient probabilistic estimate of spectrogram phase in a complex convolutional DNN. In a typical cocktail party source separation scenario, we trained a convolutional DNN to re-synthesize the complex spectrograms of two source speech signals given a complex spectrogram of the monaural mixture - a discriminative deep transform (DT). We then used this complex convolutional DT to obtain probabilistic estimates of the magnitude and phase components of the source spectrograms. Our separation results are on a par with equivalent binary-mask based non-complex separation approaches.
SDMar 24, 2015
Probabilistic Binary-Mask Cocktail-Party Source Separation in a Convolutional Deep Neural NetworkAndrew J. R. Simpson
Separation of competing speech is a key challenge in signal processing and a feat routinely performed by the human auditory brain. A long standing benchmark of the spectrogram approach to source separation is known as the ideal binary mask. Here, we train a convolutional deep neural network, on a two-speaker cocktail party problem, to make probabilistic predictions about binary masks. Our results approach ideal binary mask performance, illustrating that relatively simple deep neural networks are capable of robust binary mask prediction. We also illustrate the trade-off between prediction statistics and separation quality.
SDMar 20, 2015
Deep Transform: Cocktail Party Source Separation via Probabilistic Re-SynthesisAndrew J. R. Simpson
In cocktail party listening scenarios, the human brain is able to separate competing speech signals. However, the signal processing implemented by the brain to perform cocktail party listening is not well understood. Here, we trained two separate convolutive autoencoder deep neural networks (DNN) to separate monaural and binaural mixtures of two concurrent speech streams. We then used these DNNs as convolutive deep transform (CDT) devices to perform probabilistic re-synthesis. The CDTs operated directly in the time-domain. Our simulations demonstrate that very simple neural networks are capable of exploiting monaural and binaural information available in a cocktail party listening scenario.
SDMar 19, 2015
Deep Transform: Time-Domain Audio Error Correction via Probabilistic Re-SynthesisAndrew J. R. Simpson
In the process of recording, storage and transmission of time-domain audio signals, errors may be introduced that are difficult to correct in an unsupervised way. Here, we train a convolutional deep neural network to re-synthesize input time-domain speech signals at its output layer. We then use this abstract transformation, which we call a deep transform (DT), to perform probabilistic re-synthesis on further speech (of the same speaker) which has been degraded. Using the convolutive DT, we demonstrate the recovery of speech audio that has been subject to extreme degradation. This approach may be useful for correction of errors in communications devices.
LGFeb 16, 2015
Deep Transform: Error Correction via Probabilistic Re-SynthesisAndrew J. R. Simpson
Errors in data are usually unwelcome and so some means to correct them is useful. However, it is difficult to define, detect or correct errors in an unsupervised way. Here, we train a deep neural network to re-synthesize its inputs at its output layer for a given class of data. We then exploit the fact that this abstract transformation, which we call a deep transform (DT), inherently rejects information (errors) existing outside of the abstract feature space. Using the DT to perform probabilistic re-synthesis, we demonstrate the recovery of data that has been subject to extreme degradation.
LGFeb 13, 2015
Abstract Learning via Demodulation in a Deep Neural NetworkAndrew J. R. Simpson
Inspired by the brain, deep neural networks (DNN) are thought to learn abstract representations through their hierarchical architecture. However, at present, how this happens is not well understood. Here, we demonstrate that DNN learn abstract representations by a process of demodulation. We introduce a biased sigmoid activation function and use it to show that DNN learn and perform better when optimized for demodulation. Our findings constitute the first unambiguous evidence that DNN perform abstract learning in practical use. Our findings may also explain abstract learning in the human brain.
LGFeb 12, 2015
Over-Sampling in a Deep Neural NetworkAndrew J. R. Simpson
Deep neural networks (DNN) are the state of the art on many engineering problems such as computer vision and audition. A key factor in the success of the DNN is scalability - bigger networks work better. However, the reason for this scalability is not yet well understood. Here, we interpret the DNN as a discrete system, of linear filters followed by nonlinear activations, that is subject to the laws of sampling theory. In this context, we demonstrate that over-sampled networks are more selective, learn faster and learn more robustly. Our findings may ultimately generalize to the human brain.