David Pomerenke

CL
h-index10
3papers
8citations
Novelty30%
AI Score30

3 Papers

CLDec 5, 2022
INCLUSIFY: A benchmark and a model for gender-inclusive German

David Pomerenke

Gender-inclusive language is important for achieving gender equality in languages with gender inflections, such as German. While stirring some controversy, it is increasingly adopted by companies and political institutions. A handful of tools have been developed to help people use gender-inclusive language by identifying instances of the generic masculine and providing suggestions for more inclusive reformulations. In this report, we define the underlying tasks in terms of natural language processing, and present a dataset and measures for benchmarking them. We also present a model that implements these tasks, by combining an inclusive language database with an elaborate sequence of processing steps via standard pre-trained models. Our model achieves a recall of 0.89 and a precision of 0.82 in our benchmark for identifying exclusive language; and one of its top five suggestions is chosen in real-world texts in 44% of cases. We sketch how the area could be further advanced by training end-to-end models and using large language models; and we urge the community to include more gender-inclusive texts in their training data in order to not present an obstacle to the adoption of gender-inclusive language. Through these efforts, we hope to contribute to restoring justice in language and, to a small extent, in reality.

CLJul 11, 2025Code
The AI Language Proficiency Monitor -- Tracking the Progress of LLMs on Multilingual Benchmarks

David Pomerenke, Jonas Nothnagel, Simon Ostermann

To ensure equitable access to the benefits of large language models (LLMs), it is essential to evaluate their capabilities across the world's languages. We introduce the AI Language Proficiency Monitor, a comprehensive multilingual benchmark that systematically assesses LLM performance across up to 200 languages, with a particular focus on low-resource languages. Our benchmark aggregates diverse tasks including translation, question answering, math, and reasoning, using datasets such as FLORES+, MMLU, GSM8K, TruthfulQA, and ARC. We provide an open-source, auto-updating leaderboard and dashboard that supports researchers, developers, and policymakers in identifying strengths and gaps in model performance. In addition to ranking models, the platform offers descriptive insights such as a global proficiency map and trends over time. By complementing and extending prior multilingual benchmarks, our work aims to foster transparency, inclusivity, and progress in multilingual AI. The system is available at https://huggingface.co/spaces/fair-forward/evals-for-every-language.

AIFeb 1, 2022
Explainable AI through the Learning of Arguments

Jonas Bei, David Pomerenke, Lukas Schreiner et al.

Learning arguments is highly relevant to the field of explainable artificial intelligence. It is a family of symbolic machine learning techniques that is particularly human-interpretable. These techniques learn a set of arguments as an intermediate representation. Arguments are small rules with exceptions that can be chained to larger arguments for making predictions or decisions. We investigate the learning of arguments, specifically the learning of arguments from a 'case model' proposed by Verheij [34]. The case model in Verheij's approach are cases or scenarios in a legal setting. The number of cases in a case model are relatively low. Here, we investigate whether Verheij's approach can be used for learning arguments from other types of data sets with a much larger number of instances. We compare the learning of arguments from a case model with the HeRO algorithm [15] and learning a decision tree.