Špela Vintar

CL
h-index13
4papers
1citation
Novelty30%
AI Score36

4 Papers

CLNov 6, 2025
The truth is no diaper: Human and AI-generated associations to emotional words

Špela Vintar, Jan Jona Javoršek

Human word associations are a well-known method of gaining insight into the internal mental lexicon, but the responses spontaneously offered by human participants to word cues are not always predictable as they may be influenced by personal experience, emotions or individual cognitive styles. The ability to form associative links between seemingly unrelated concepts can be the driving mechanisms of creativity. We perform a comparison of the associative behaviour of humans compared to large language models. More specifically, we explore associations to emotionally loaded words and try to determine whether large language models generate associations in a similar way to humans. We find that the overlap between humans and LLMs is moderate, but also that the associations of LLMs tend to amplify the underlying emotional load of the stimulus, and that they tend to be more predictable and less creative than human ones.

CLOct 24, 2025Code
From Polyester Girlfriends to Blind Mice: Creating the First Pragmatics Understanding Benchmarks for Slovene

Mojca Brglez, Špela Vintar

Large language models are demonstrating increasing capabilities, excelling at benchmarks once considered very difficult. As their capabilities grow, there is a need for more challenging evaluations that go beyond surface-level linguistic competence. Namely, language competence involves not only syntax and semantics but also pragmatics, i.e., understanding situational meaning as shaped by context as well as linguistic and cultural norms. To contribute to this line of research, we introduce SloPragEval and SloPragMega, the first pragmatics understanding benchmarks for Slovene that contain altogether 405 multiple-choice questions. We discuss the difficulties of translation, describe the campaign to establish a human baseline, and report pilot evaluations with LLMs. Our results indicate that current models have greatly improved in understanding nuanced language but may still fail to infer implied speaker meaning in non-literal utterances, especially those that are culture-specific. We also observe a significant gap between proprietary and open-source models. Finally, we argue that benchmarks targeting nuanced language understanding and knowledge of the target culture must be designed with care, preferably constructed from native data, and validated with human responses.

CLOct 28, 2025
Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices

Špela Vintar, Taja Kuzman Pungeršek, Mojca Brglez et al.

While new benchmarks for large language models (LLMs) are being developed continuously to catch up with the growing capabilities of new models and AI in general, using and evaluating LLMs in non-English languages remains a little-charted landscape. We give a concise overview of recent developments in LLM benchmarking, and then propose a new taxonomy for the categorization of benchmarks that is tailored to multilingual or non-English use scenarios. We further propose a set of best practices and quality standards that could lead to a more coordinated development of benchmarks for European languages. Among other recommendations, we advocate for a higher language and culture sensitivity of evaluation methods.

CLMar 31, 2022
A bilingual approach to specialised adjectives through word embeddings in the karstology domain

Larisa Grčić Simeunović, Matej Martinc, Špela Vintar

We present an experiment in extracting adjectives which express a specific semantic relation using word embeddings. The results of the experiment are then thoroughly analysed and categorised into groups of adjectives exhibiting formal or semantic similarity. The experiment and analysis are performed for English and Croatian in the domain of karstology using data sets and methods developed in the TermFrame project. The main original contributions of the article are twofold: firstly, proposing a new and promising method of extracting semantically related words relevant for terminology, and secondly, providing a detailed evaluation of the output so that we gain a better understanding of the domain-specific semantic structures on the one hand and the types of similarities extracted by word embeddings on the other.