Jaroslav Čmejla

h-index5

3papers

51citations

Novelty33%

AI Score19

Ranked #186,220 of 194,257 authors (top 96%)#1,419 in AS (top 98%)

3 Papers

1.2ASNov 5, 2021

Target Speech Extraction: Independent Vector Extraction Guided by Supervised Speaker Identification

Jiri Malek, Jakub Jansky, Zbynek Koldovsky et al.

This manuscript proposes a novel robust procedure for the extraction of a speaker of interest (SOI) from a mixture of audio sources. The estimation of the SOI is performed via independent vector extraction (IVE). Since the blind IVE cannot distinguish the target source by itself, it is guided towards the SOI via frame-wise speaker identification based on deep learning. Still, an incorrect speaker can be extracted due to guidance failings, especially when processing challenging data. To identify such cases, we propose a criterion for non-intrusively assessing the estimated speaker. It utilizes the same model as the speaker identification, so no additional training is required. When incorrect extraction is detected, we propose a ``deflation'' step in which the incorrect source is subtracted from the mixture and, subsequently, another attempt to extract the SOI is performed. The process is repeated until successful extraction is achieved. The proposed procedure is experimentally tested on artificial and real-world datasets containing challenging phenomena: source movements, reverberation, transient noise, or microphone failures. The method is compared with state-of-the-art blind algorithms as well as with current fully supervised deep learning-based methods.

7.3ASOct 25, 2019

Adaptive blind audio source extraction supervised by dominant speaker identification using x-vectors

Jakub Janský, Jiří Málek, Jaroslav Čmejla et al.

We propose a novel algorithm for adaptive blind audio source extraction. The proposed method is based on independent vector analysis and utilizes the auxiliary function optimization to achieve high convergence speed. The algorithm is partially supervised by a pilot signal related to the source of interest (SOI), which ensures that the method correctly extracts the utterance of the desired speaker. The pilot is based on the identification of a dominant speaker in the mixture using x-vectors. The properties of the x-vectors computed in the presence of cross-talk are experimentally analyzed. The proposed approach is verified in a scenario with a moving SOI, static interfering speaker, and environmental noise.

3.3ASJul 29, 2019

MIRaGe: Multichannel Database Of Room Impulse Responses Measured On High-Resolution Cube-Shaped Grid In Multiple Acoustic Conditions

Jaroslav Čmejla, Tomáš Kounovský, Sharon Gannot et al.

We introduce a database of multi-channel recordings performed in an acoustic lab with adjustable reverberation time. The recordings provide information about room impulse responses (RIR) for various positions of a loudspeaker. In particular, the main positions correspond to 4104 vertices of a cube-shaped dense grid within a 46x36x32 cm volume. The database thus provides a tool for detailed analyses of beampatterns of spatial processing methods as well as for training and testing of mathematical models of the acoustic field.