Adnan Haider

h-index11

3papers

6citations

Novelty40%

AI Score18

Ranked #188,575 of 194,257 authors (top 97%)#39,708 in LG (top 99%)

3 Papers

2.2LGOct 3, 2018

Combining Natural Gradient with Hessian Free Methods for Sequence Training

Adnan Haider, P. C. Woodland

This paper presents a new optimisation approach to train Deep Neural Networks (DNNs) with discriminative sequence criteria. At each iteration, the method combines information from the Natural Gradient (NG) direction with local curvature information of the error surface that enables better paths on the parameter manifold to be traversed. The method is derived using an alternative derivation of Taylor's theorem using the concepts of manifolds, tangent vectors and directional derivatives from the perspective of Information Geometry. The efficacy of the method is shown within a Hessian Free (HF) style optimisation framework to sequence train both standard fully-connected DNNs and Time Delay Neural Networks as speech recognition acoustic models. It is shown that for the same number of updates the proposed approach achieves larger reductions in the word error rate (WER) than both NG and HF, and also leads to a lower WER than standard stochastic gradient descent. The paper also addresses the issue of over-fitting due to mismatch between training criterion and Word Error Rate (WER) that primarily arises during sequence training of ReLU-DNN models.

0.3CLApr 6, 2018

Sequence Training of DNN Acoustic Models With Natural Gradient

Adnan Haider, Philip C. Woodland

Deep Neural Network (DNN) acoustic models often use discriminative sequence training that optimises an objective function that better approximates the word error rate (WER) than frame-based training. Sequence training is normally implemented using Stochastic Gradient Descent (SGD) or Hessian Free (HF) training. This paper proposes an alternative batch style optimisation framework that employs a Natural Gradient (NG) approach to traverse through the parameter space. By correcting the gradient according to the local curvature of the KL-divergence, the NG optimisation process converges more quickly than HF. Furthermore, the proposed NG approach can be applied to any sequence discriminative training criterion. The efficacy of the NG method is shown using experiments on a Multi-Genre Broadcast (MGB) transcription task that demonstrates both the computational efficiency and the accuracy of the resulting DNN models.

1.5LGMar 26, 2018

A Common Framework for Natural Gradient and Taylor based Optimisation using Manifold Theory

Adnan Haider

This technical report constructs a theoretical framework to relate standard Taylor approximation based optimisation methods with Natural Gradient (NG), a method which is Fisher efficient with probabilistic models. Such a framework will be shown to also provide mathematical justification to combine higher order methods with the method of NG.