Dietlind Zühlke

CL
3papers
586citations
Novelty25%
AI Score21

3 Papers

CVMay 28, 2023
ConvGenVisMo: Evaluation of Conversational Generative Vision Models

Narjes Nikzad Khasmakhi, Meysam Asgari-Chenaghlu, Nabiha Asghar et al.

Conversational generative vision models (CGVMs) like Visual ChatGPT (Wu et al., 2023) have recently emerged from the synthesis of computer vision and natural language processing techniques. These models enable more natural and interactive communication between humans and machines, because they can understand verbal inputs from users and generate responses in natural language along with visual outputs. To make informed decisions about the usage and deployment of these models, it is important to analyze their performance through a suitable evaluation framework on realistic datasets. In this paper, we present ConvGenVisMo, a framework for the novel task of evaluating CGVMs. ConvGenVisMo introduces a new benchmark evaluation dataset for this task, and also provides a suite of existing and new automated evaluation metrics to evaluate the outputs. All ConvGenVisMo assets, including the dataset and the evaluation code, will be made available publicly on GitHub.

NEJun 2, 2021
Automating Speedrun Routing: Overview and Vision

Matthias Groß, Dietlind Zühlke, Boris Naujoks

Speedrunning in general means to play a video game fast, i.e. using all means at one's disposal to achieve a given goal in the least amount of time possible. To do so, a speedrun must be planned in advance, or routed, as referred to by the community. This paper focuses on discovering challenges and defining models needed when trying to approach the problem of routing algorithmically. To do so, this paper is split in two parts. The first part provides an overview of relevant speedrunning literature, extracting vital information and formulating criticism. Important categorizations are pointed out and a nomenclature is built to support professional discussion. The second part of this paper then refers to the actual speedrun routing optimization problem. Different concepts of graph representations are presented and their potential is discussed. Visions both for problem modeling as well as solving are presented and assessed regarding suitability and expected challenges. Finally, a first assessment of the applicability of existing optimization methods to the defined problem is made, including metaheuristics/EA and Deep Learning methods.

CLMay 26, 2021
Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition

Julia Pritzen, Michael Gref, Dietlind Zühlke et al.

Anglicisms are a challenge in German speech recognition. Due to their irregular pronunciation compared to native German words, automatically generated pronunciation dictionaries often include faulty phoneme sequences for Anglicisms. In this work, we propose a multitask sequence-to-sequence approach for grapheme-to-phoneme conversion to improve the phonetization of Anglicisms. We extended a grapheme-to-phoneme model with a classifier to distinguish Anglicisms from native German words. With this approach, the model learns to generate pronunciations differently depending on the classification result. We used our model to create supplementary Anglicism pronunciation dictionaries that are added to an existing German speech recognition model. Tested on a dedicated Anglicism evaluation set, we improved the recognition of Anglicisms compared to a baseline model, reducing the word error rate by 1 % and the Anglicism error rate by 3 %. We show that multitask learning can help solving the challenge of Anglicisms in German speech recognition.