SDApr 3, 2021Code
Mixture of orthogonal sequences made from extended time-stretched pulses enables measurement of involuntary voice fundamental frequency response to pitch perturbationHideki Kawahara, Toshie Matsui, Kohei Yatabe et al.
Auditory feedback plays an essential role in the regulation of the fundamental frequency of voiced sounds. The fundamental frequency also responds to auditory stimulation other than the speaker's voice. We propose to use this response of the fundamental frequency of sustained vowels to frequency-modulated test signals for investigating involuntary control of voice pitch. This involuntary response is difficult to identify and isolate by the conventional paradigm, which uses step-shaped pitch perturbation. We recently developed a versatile measurement method using a mixture of orthogonal sequences made from a set of extended time-stretched pulses (TSP). In this article, we extended our approach and designed a set of test signals using the mixture to modulate the fundamental frequency of artificial signals. For testing the response, the experimenter presents the modulated signal aurally while the subject is voicing sustained vowels. We developed a tool for conducting this test quickly and interactively. We make the tool available as an open-source and also provide executable GUI-based applications. Preliminary tests revealed that the proposed method consistently provides compensatory responses with about 100 ms latency, representing involuntary control. Finally, we discuss future applications of the proposed method for objective and non-invasive auditory response measurements.
ASSep 10, 2019Code
Frequency domain variant of Velvet noise and its application to acoustic measurementsHideki Kawahara, Ken-Ichi Sakakibara, Mitsunori Mizumachi et al.
We propose a new family of test signals for acoustic measurements such as impulse response, nonlinearity, and the effects of background noise. The proposed family complements difficulties in existing families, the Swept-Sine (SS), pseudo-random noise such as the maximum length sequence (MLS). The proposed family uses the frequency domain variant of the Velvet noise (FVN) as its building block. An FVN is an impulse response of an all-pass filter and yields the unit impulse when convolved with the time-reversed version of itself. In this respect, FVN is a member of the time-stretched pulse (TSP) in the broadest sense. The high degree of freedom in designing an FVN opens a vast range of applications in acoustic measurement. We introduce the following applications and their specific procedures, among other possibilities. They are as follows. a) Spectrum shaping adaptive to background noise. b) Simultaneous measurement of impulse responses of multiple acoustic paths. d) Simultaneous measurement of linear and nonlinear components of an acoustic path. e) Automatic procedure for time axis alignment of the source and the receiver when they are using independent clocks in acoustic impulse response measurement. We implemented a reference measurement tool equipped with all these procedures. The MATLAB source code and related materials are open-sourced and placed in a GitHub repository.
ASFeb 22, 2017Code
A new cosine series antialiasing function and its application to aliasing-free glottal source models for speech and singing synthesisHideki Kawahara, Ken-Ichi Sakakibara, Hideki Banno et al.
We formulated and implemented a procedure to generate aliasing-free excitation source signals. It uses a new antialiasing filter in the continuous time domain followed by an IIR digital filter for response equalization. We introduced a cosine-series-based general design procedure for the new antialiasing function. We applied this new procedure to implement the antialiased Fujisaki-Ljungqvist model. We also applied it to revise our previous implementation of the antialiased Fant-Liljencrants model. A combination of these signals and a lattice implementation of the time varying vocal tract model provides a reliable and flexible basis to test fo extractors and source aperiodicity analysis methods. MATLAB implementations of these antialiased excitation source models are available as part of our open source tools for speech science.
3.8ASApr 8
Disentangling peripheral hearing loss from central and cognitive effects on speech intelligibility in older adultsToshio Irino, Ayako Yamamoto, Fuki Miyazaki
Age-related hearing loss (HL) reduces speech intelligibility (SI) in older adults (OAs). However, deficits in central and cognitive processing also substantially impact SI. Understanding these contributions is essential for explaining individual differences and developing effective assistive hearing strategies. This study presents a framework that distinguishes peripheral HL from central and cognitive influences on SI. This framework uses the Wakayama University Hearing Impairment Simulator (WHIS), and the Gammachirp Envelope Similarity Index (GESI), an objective measure of intelligibility. First, speech-in-noise tests were conducted with young, normal-hearing listeners (YNHs) using WHIS to simulate the audiogram of a target OA. The target OA achieved SI scores comparable to or higher than those of YNHs with simulated HL, suggesting contributions beyond peripheral hearing function. Then, GESI was used to predict SI scores for YNHs and OAs across different hearing levels. The prediction accuracy was comparable for both groups. Interestingly, many OAs' subjective SI scores were higher than those predicted using parameters derived from YNHs' experiments. This finding is inconsistent with previous research indicating that speech perception ability declines with age. This issue will be discussed. There was no significant correlation between the average hearing levels and the residual differences between the subjective and predicted SI scores. This suggests that GESI effectively absorbed the effects of peripheral HL. Thus, the proposed framework may facilitate systematic examination and comparison of central and cognitive factors beyond peripheral HL among individual YNHs and OAs with and without HL.
ASApr 17, 2021
Comparison of remote experiments using crowdsourcing and laboratory experiments on speech intelligibilityAyako Yamamoto, Toshio Irino, Kenichi Arai et al.
Many subjective experiments have been performed to develop objective speech intelligibility measures, but the novel coronavirus outbreak has made it very difficult to conduct experiments in a laboratory. One solution is to perform remote testing using crowdsourcing; however, because we cannot control the listening conditions, it is unclear whether the results are entirely reliable. In this study, we compared speech intelligibility scores obtained in remote and laboratory experiments. The results showed that the mean and standard deviation (SD) of the remote experiments' speech reception threshold (SRT) were higher than those of the laboratory experiments. However, the variance in the SRTs across the speech-enhancement conditions revealed similarities, implying that remote testing results may be as useful as laboratory experiments to develop an objective measure. We also show that the practice session scores correlate with the SRT values. This is a priori information before performing the main tests and would be useful for data screening to reduce the variability of the SRT distribution.
SDApr 3, 2019
GEDI: Gammachirp Envelope Distortion Index for Predicting Intelligibility of Enhanced SpeechKatsuhiko Yamamoto, Toshio Irino, Shoko Araki et al.
In this study, we propose a new concept, the gammachirp envelope distortion index (GEDI), based on the signal-to-distortion ratio in the auditory envelope, SDRenv to predict the intelligibility of speech enhanced by nonlinear algorithms. The objective of GEDI is to calculate the distortion between enhanced and clean-speech representations in the domain of a temporal envelope extracted by the gammachirp auditory filterbank and modulation filterbank. We also extend GEDI with multi-resolution analysis (mr-GEDI) to predict the speech intelligibility of sounds under non-stationary noise conditions. We evaluate GEDI in terms of speech intelligibility predictions of speech sounds enhanced by a classic spectral subtraction and a Wiener filtering method. The predictions are compared with human results for various signal-to-noise ratio conditions with additive pink and babble noises. The results showed that mr-GEDI predicted the intelligibility curves better than short-time objective intelligibility (STOI) measure, extended-STOI (ESTOI) measure, and hearing-aid speech perception index (HASPI) under pink-noise conditions, and better than HASPI under babble-noise conditions. The mr-GEDI method does not present an overestimation tendency and is considered a more conservative approach than STOI and ESTOI. Therefore, the evaluation with mr-GEDI may provide additional information in the development of speech enhancement algorithms.
SDJun 18, 2018
Frequency domain variants of velvet noise and their application to speech processing and synthesis: with appendicesHideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise et al.
We propose a new excitation source signal for VOCODERs and an all-pass impulse response for post-processing of synthetic sounds and pre-processing of natural sounds for data-augmentation. The proposed signals are variants of velvet noise, which is a sparse discrete signal consisting of a few non-zero (1 or -1) elements and sounds smoother than Gaussian white noise. One of the proposed variants, FVN (Frequency domain Velvet Noise) applies the procedure to generate a velvet noise on the cyclic frequency domain of DFT (Discrete Fourier Transform). Then, by smoothing the generated signal to design the phase of an all-pass filter followed by inverse Fourier transform yields the proposed FVN. Temporally variable frequency weighted mixing of FVN generated by frozen and shuffled random number provides a unified excitation signal which can span from random noise to a repetitive pulse train. The other variant, which is an all-pass impulse response, significantly reduces "buzzy" impression of VOCODER output by filtering. Finally, we will discuss applications of the proposed signal for watermarking and psychoacoustic research.