CLSDASDec 10, 2021

Shennong: a Python toolbox for audio speech features extraction

arXiv:2112.05555v11 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides an open-source, easy-to-use framework for researchers and practitioners in speech processing, but it is incremental as it consolidates existing methods into a single toolbox.

The authors introduced Shennong, a Python toolbox for extracting speech features, which implements various established algorithms and aims to replace or complement tools like Kaldi or Praat, demonstrating its use in applications such as phone discrimination and pitch estimation under noise.

We introduce Shennong, a Python toolbox and command-line utility for speech features extraction. It implements a wide range of well-established state of art algorithms including spectro-temporal filters such as Mel-Frequency Cepstral Filterbanks or Predictive Linear Filters, pre-trained neural networks, pitch estimators as well as speaker normalization methods and post-processing algorithms. Shennong is an open source, easy-to-use, reliable and extensible framework. The use of Python makes the integration to others speech modeling and machine learning tools easy. It aims to replace or complement several heterogeneous software, such as Kaldi or Praat. After describing the Shennong software architecture, its core components and implemented algorithms, this paper illustrates its use on three applications: a comparison of speech features performances on a phones discrimination task, an analysis of a Vocal Tract Length Normalization model as a function of the speech duration used for training and a comparison of pitch estimation algorithms under various noise conditions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes