Peter Birkholz

LG
4papers
29citations
Novelty34%
AI Score19

4 Papers

SDApr 20, 2022
Exploration strategies for articulatory synthesis of complex syllable onsets

Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov et al.

High-quality articulatory speech synthesis has many potential applications in speech science and technology. However, developing appropriate mappings from linguistic specification to articulatory gestures is difficult and time consuming. In this paper we construct an optimisation-based framework as a first step towards learning these mappings without manual intervention. We demonstrate the production of syllables with complex onsets and discuss the quality of the articulatory gestures with reference to coarticulation.

LGMar 8, 2021
PyRCN: A Toolbox for Exploration and Application of Reservoir Computing Networks

Peter Steiner, Azarakhsh Jalalvand, Simon Stone et al.

Reservoir Computing Networks (RCNs) belong to a group of machine learning techniques that project the input space non-linearly into a high-dimensional feature space, where the underlying task can be solved linearly. Popular variants of RCNs are capable of solving complex tasks equivalently to widely used deep neural networks, but with a substantially simpler training paradigm based on linear regression. In this paper, we show how to uniformly describe RCNs with small and clearly defined building blocks, and we introduce the Python toolbox PyRCN (Python Reservoir Computing Networks) for optimizing, training and analyzing RCNs on arbitrarily large datasets. The tool is based on widely-used scientific packages and complies with the scikit-learn interface specification. It provides a platform for educational and exploratory analyses of RCNs, as well as a framework to apply RCNs on complex tasks including sequence processing. With a small number of building blocks, the framework allows the implementation of numerous different RCN architectures. We provide code examples on how to set up RCNs for time series prediction and for sequence classification tasks. PyRCN is around ten times faster than reference toolboxes on a benchmark task while requiring substantially less boilerplate code.

LGMar 8, 2021
Cluster-based Input Weight Initialization for Echo State Networks

Peter Steiner, Azarakhsh Jalalvand, Peter Birkholz

Echo State Networks (ESNs) are a special type of recurrent neural networks (RNNs), in which the input and recurrent connections are traditionally generated randomly, and only the output weights are trained. Despite the recent success of ESNs in various tasks of audio, image and radar recognition, we postulate that a purely random initialization is not the ideal way of initializing ESNs. The aim of this work is to propose an unsupervised initialization of the input connections using the $K$-Means algorithm on the training data. We show that for a large variety of datasets this initialization performs equivalently or superior than a randomly initialized ESN whilst needing significantly less reservoir neurons. Furthermore, we discuss that this approach provides the opportunity to estimate a suitable size of the reservoir based on prior knowledge about the data.

ASMay 20, 2020
Evaluating Features and Metrics for High-Quality Simulation of Early Vocal Learning of Vowels

Branislav Gerazov, Daniel van Niekerk, Anqi Xu et al.

The way infants use auditory cues to learn to speak despite the acoustic mismatch of their vocal apparatus is a hot topic of scientific debate. The simulation of early vocal learning using articulatory speech synthesis offers a way towards gaining a deeper understanding of this process. One of the crucial parameters in these simulations is the choice of features and a metric to evaluate the acoustic error between the synthesised sound and the reference target. We contribute with evaluating the performance of a set of 40 feature-metric combinations for the task of optimising the production of static vowels with a high-quality articulatory synthesiser. Towards this end we assess the usability of formant error and the projection of the feature-metric error surface in the normalised F1-F2 formant space. We show that this approach can be used to evaluate the impact of features and metrics and also to offer insight to perceptual results.