Jan Jasiński

h-index3

5papers

24citations

5 Papers

10.2SDJun 22

From Text Metrics to Model Internals: A Study of Whisper ASR Hallucination Detection

Jan Jasiński, Mateusz Barański, Julitta Bartolewska et al.

Hallucinations of ASR models - fluent transcriptions with no basis in audio - degrade system performance and pose risks in downstream applications. Robust detection of such errors remains a challenge. This paper studies Whisper large v3 hallucination detection on real-speech human-annotated data across three paradigms: text-based, LLM-based, and internal decoder state probing. Text classifiers utilizing metrics for text evaluation achieve high recall but degrade without reference transcripts. LLM-based detection improves precision with domain-specific prompt conditioning, yet remains less competitive than the lightweight text-based methods. Probing Whisper's decoder representations, without a ground-truth reference, yields the strongest performance, revealing that hallucination traits are encoded across intermediate decoding layers. A late-fusion meta-classifier combining text and internal-state outputs achieves the best overall detection performance.

11.6SDJun 22

HALAS: A Human-Annotated Dataset of Hallucinations of Modern ASR Systems

Mateusz Barański, Jan Jasiński, Julitta Bartolewska et al.

End-to-end Automatic Speech Recognition (ASR) systems hallucinate on natural speech, yet existing mitigation methods are typically evaluated on non-speech or artificially corrupted audio. We introduce HALAS, the first human-annotated dataset of naturally occurring hallucinations from seven state-of-the-art ASR models on real unprocessed earnings call recordings. HALAS provides span-level labels, enabling analysis of hallucination patterns and their severity. Our analysis reveals strong cross-model vocabulary overlap and confirms that hallucinations also occur for almost correctly transcribed speech (characterized by a low Word Error Rate). The proposed benchmark with HALAS shows that the character and semantic-level metrics used as a proxy for hallucination detection reach 81% ROC-AUC, while state-of-the-art detection methods achieve an F1 score of only 53.1%. As such, HALAS establishes the first rigorous non-artificial benchmark for the detection and mitigation of ASR hallucinations.

8.5ASJun 23

The effect of micro-changes in the pluck trajectory on the sound of an acoustic guitar

Marek Pluta, Jan Jasiński, Daniel Tokarczyk et al.

This study explores how micro-changes in the plucking trajectory of a guitar pick influence the sound of an acoustic guitar. Using a state-of-the-art robotic plucker, a series of measurements has been performed, where the plectrum was moved towards the instrument by a step of 192 micrometers, resulting in an increased attack depth. It has been analysed how the effect of these changes is reflected in loudness, timbre, harmonic content and how the sound progresses during decay. This methodology has been repeated for guitar plectra made from six different materials to investigate how the pick itself influences the effect of a change in the plucking trajectory. The results of the study show that at a low depth the string is not fully excited resulting in weak and markedly altered sound. The range of this effect changes with the mechanical properties of the plectrum material. After this range an increase in depth results in an increase in sound loudness, a decrease in inharmonicity and noisiness and a shift in timbre where the sound becomes fuller in low frequencies and rougher. Presented findings help to understand the nuanced relationship between plucking trajectory and acoustic output. They provide important insights regarding the importance of plucking in guitar testing methodologies, showing that the mech

26.7SDJan 20, 2025

Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio

Mateusz Barański, Jan Jasiński, Julitta Bartolewska et al.

Hallucinations of deep neural models are amongst key challenges in automatic speech recognition (ASR). In this paper, we investigate hallucinations of the Whisper ASR model induced by non-speech audio segments present during inference. By inducting hallucinations with various types of sounds, we show that there exists a set of hallucinations that appear frequently. We then study hallucinations caused by the augmentation of speech with such sounds. Finally, we describe the creation of a bag of hallucinations (BoH) that allows to remove the effect of hallucinations through the post-processing of text transcriptions. The results of our experiments show that such post-processing is capable of reducing word error rate (WER) and acts as a good safeguard against problematic hallucinations.

7.1SDJun 18

PolSeT: Polish Semantics of Timbre Dataset

Jan Jasiński

This data report introduces PolSeT (Polish Semantic Timbre), a dataset designed to facilitate research in psychoacoustics and Music Information Retrieval (MIR) in Polish and cross-cultural contexts. The dataset contains data from two sequential experiments. Experiment 1 (N=60) was a free-verbalization task aimed at creating a lexicon of Polish semantic descriptors. Using 11 stimuli, a total of 1901 descriptors (701 unique) were gathered. Experiment 2 (N=105) utilized this lexicon to conduct a semantic differential study, where participants rated 18 instrument sounds on 8 bipolar scales, with repeated trials for reliability analysis. The released dataset includes raw listener responses, comprehensive demographics (experience, gender, age), audio stimuli, and extracted acoustic features with Python extraction code. This dataset addresses a gap in open timbre research data, providing both the qualitative linguistic groundwork and the quantitative ratings necessary for psychoacoustic research and the training of multilingual semantic embedding models.