Marie Kunešová

2papers

2 Papers

ASSep 16, 2025
Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for VoiceMOS 2024

Marie Kunešová, Aleš Pražák, Jan Lehečka

We present a system for non-intrusive prediction of speech quality in noisy and enhanced speech, developed for Track 3 of the VoiceMOS 2024 Challenge. The task required estimating the ITU-T P.835 metrics SIG, BAK, and OVRL without reference signals and with only 100 subjectively labeled utterances for training. Our approach uses wav2vec 2.0 with a two-stage transfer learning strategy: initial fine-tuning on automatically labeled noisy data, followed by adaptation to the challenge data. The system achieved the best performance on BAK prediction (LCC=0.867) and a very close second place in OVRL (LCC=0.711) in the official evaluation. Post-challenge experiments show that adding artificially degraded data to the first fine-tuning stage substantially improves SIG prediction, raising correlation with ground truth scores from 0.207 to 0.516. These results demonstrate that transfer learning with targeted data generation is effective for predicting P.835 scores under severe data constraints.

ASMay 27, 2019
UWB-NTIS Speaker Diarization System for the DIHARD II 2019 Challenge

Zbyněk Zajíc, Marie Kunešová, Marek Hrúz et al.

In this paper, we present our system developed by the team from the New Technologies for the Information Society (NTIS) research center of the University of West Bohemia in Pilsen, for the Second DIHARD Speech Diarization Challenge. The base of our system follows the currently-standard approach of segmentation, i/x-vector extraction, clustering, and resegmentation. The hyperparameters for each of the subsystems were selected according to the domain classifier trained on the development set of DIHARD II. We compared our system with results from the Kaldi diarization (with i/x-vectors) and combined these systems. At the time of writing of this abstract, our best submission achieved a DER of 23.47% and a JER of 48.99% on the evaluation set (in Track 1 using reference SAD).