Elena Merino-Gómez

CL
h-index9
10papers
84citations
Novelty22%
AI Score34

10 Papers

CLAug 19, 2022
Text to Image Generation: Leaving no Language Behind

Pedro Reviriego, Elena Merino-Gómez

One of the latest applications of Artificial Intelligence (AI) is to generate images from natural language descriptions. These generators are now becoming available and achieve impressive results that have been used for example in the front cover of magazines. As the input to the generators is in the form of a natural language text, a question that arises immediately is how these models behave when the input is written in different languages. In this paper we perform an initial exploration of how the performance of three popular text-to-image generators depends on the language. The results show that there is a significant performance degradation when using languages other than English, especially for languages that are not widely used. This observation leads us to discuss different alternatives on how text-to-image generators can be improved so that performance is consistent across different languages. This is fundamental to ensure that this new technology can be used by non-native English speakers and to preserve linguistic diversity.

CVOct 7, 2022
Can Artificial Intelligence Reconstruct Ancient Mosaics?

Fernando Moral-Andrés, Elena Merino-Gómez, Pedro Reviriego et al.

A large number of ancient mosaics have not reached us because they have been destroyed by erosion, earthquakes, looting or even used as materials in newer construction. To make things worse, among the small fraction of mosaics that we have been able to recover, many are damaged or incomplete. Therefore, restoration and reconstruction of mosaics play a fundamental role to preserve cultural heritage and to understand the role of mosaics in ancient cultures. This reconstruction has traditionally been done manually and more recently using computer graphics programs but always by humans. In the last years, Artificial Intelligence (AI) has made impressive progress in the generation of images from text descriptions and reference images. State of the art AI tools such as DALL-E2 can generate high quality images from text prompts and can take a reference image to guide the process. In august 2022, DALL-E2 launched a new feature called outpainting that takes as input an incomplete image and a text prompt and then generates a complete image filling the missing parts. In this paper, we explore whether this innovative technology can be used to reconstruct mosaics with missing parts. Hence a set of ancient mosaics have been used and reconstructed using DALL-E2; results are promising showing that AI is able to interpret the key features of the mosaics and is able to produce reconstructions that capture the essence of the scene. However, in some cases AI fails to reproduce some details, geometric forms or introduces elements that are not consistent with the rest of the mosaic. This suggests that as AI image generation technology matures in the next few years, it could be a valuable tool for mosaic reconstruction going forward.

CLSep 28, 2023
How many words does ChatGPT know? The answer is ChatWords

Gonzalo Martínez, Javier Conde, Pedro Reviriego et al.

The introduction of ChatGPT has put Artificial Intelligence (AI) Natural Language Processing (NLP) in the spotlight. ChatGPT adoption has been exponential with millions of users experimenting with it in a myriad of tasks and application domains with impressive results. However, ChatGPT has limitations and suffers hallucinations, for example producing answers that look plausible but they are completely wrong. Evaluating the performance of ChatGPT and similar AI tools is a complex issue that is being explored from different perspectives. In this work, we contribute to those efforts with ChatWords, an automated test system, to evaluate ChatGPT knowledge of an arbitrary set of words. ChatWords is designed to be extensible, easy to use, and adaptable to evaluate also other NLP AI tools. ChatWords is publicly available and its main goal is to facilitate research on the lexical knowledge of AI tools. The benefits of ChatWords are illustrated with two case studies: evaluating the knowledge that ChatGPT has of the Spanish lexicon (taken from the official dictionary of the "Real Academia Española") and of the words that appear in the Quixote, the well-known novel written by Miguel de Cervantes. The results show that ChatGPT is only able to recognize approximately 80% of the words in the dictionary and 90% of the words in the Quixote, in some cases with an incorrect meaning. The implications of the lexical knowledge of NLP AI tools and potential applications of ChatWords are also discussed providing directions for further work on the study of the lexical knowledge of AI tools.

CLAug 14, 2023
Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans

Pedro Reviriego, Javier Conde, Elena Merino-Gómez et al.

The introduction of Artificial Intelligence (AI) generative language models such as GPT (Generative Pre-trained Transformer) and tools such as ChatGPT has triggered a revolution that can transform how text is generated. This has many implications, for example, as AI-generated text becomes a significant fraction of the text, would this have an effect on the language capabilities of readers and also on the training of newer AI tools? Would it affect the evolution of languages? Focusing on one specific aspect of the language: words; will the use of tools such as ChatGPT increase or reduce the vocabulary used or the lexical richness? This has implications for words, as those not included in AI-generated content will tend to be less and less popular and may eventually be lost. In this work, we perform an initial comparison of the vocabulary and lexical richness of ChatGPT and humans when performing the same tasks. In more detail, two datasets containing the answers to different types of questions answered by ChatGPT and humans, and a third dataset in which ChatGPT paraphrases sentences and questions are used. The analysis shows that ChatGPT tends to use fewer distinct words and lower lexical richness than humans. These results are very preliminary and additional datasets and ChatGPT configurations have to be evaluated to extract more general conclusions. Therefore, further research is needed to understand how the use of ChatGPT and more broadly generative AI tools will affect the vocabulary and lexical richness in different types of text and languages.

CLOct 23, 2023
Establishing Vocabulary Tests as a Benchmark for Evaluating Large Language Models

Gonzalo Martínez, Javier Conde, Elena Merino-Gómez et al.

Vocabulary tests, once a cornerstone of language modeling evaluation, have been largely overlooked in the current landscape of Large Language Models (LLMs) like Llama, Mistral, and GPT. While most LLM evaluation benchmarks focus on specific tasks or domain-specific knowledge, they often neglect the fundamental linguistic aspects of language understanding and production. In this paper, we advocate for the revival of vocabulary tests as a valuable tool for assessing LLM performance. We evaluate seven LLMs using two vocabulary test formats across two languages and uncover surprising gaps in their lexical knowledge. These findings shed light on the intricacies of LLM word representations, their learning mechanisms, and performance variations across models and languages. Moreover, the ability to automatically generate and perform vocabulary tests offers new opportunities to expand the approach and provide a more complete picture of LLMs' language skills.

CLSep 8, 2024
Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?

Marina Mayor-Rocher, Nina Melero, Elena Merino-Gómez et al.

Large Language Models (LLMs) have been profusely evaluated on their ability to answer questions on many topics and their performance on different natural language understanding tasks. Those tests are usually conducted in English, but most LLM users are not native English speakers. Therefore, it is of interest to analyze how LLMs understand other languages at different levels: from paragraphs to morphems. In this paper, we evaluate the performance of state-of-the-art LLMs in TELEIA, a recently released benchmark with similar questions to those of Spanish exams for foreign students, covering topics such as reading comprehension, word formation, meaning and compositional semantics, and grammar. The results show that LLMs perform well at understanding Spanish but are still far from achieving the level of a native speaker in terms of grammatical competence.

CVMar 3, 2023
Unfinished Architectures: A Perspective from Artificial Intelligence

Elena Merino-Gómez, Pedro Reviriego, Fernando Moral

Unfinished buildings are a constant throughout the history of architecture and have given rise to intense debates on the opportuneness of their completion, in addition to offering alibis for theorizing about the compositional possibilities in coherence with the finished parts. The development of Artificial Intelligence (AI) opens new avenues for the proposal of possibilities for the completion of unfinished architectures. Specifically, with the recent appearance of tools such as DALL-E, capable of completing images guided by a textual description, it is possible to count on the help of AI for architectural design tasks. In this article we explore the use of these new AI tools for the completion of unfinished facades of historical temples and analyse the still germinal stadium in the field of architectural graphic composition.

CLMar 21, 2024Code
Open Conversational LLMs do not know most Spanish words

Javier Conde, Miguel González, Nina Melero et al.

The growing interest in Large Language Models (LLMs) and in particular in conversational models with which users can interact has led to the development of a large number of open-source chat LLMs. These models are evaluated on a wide range of benchmarks to assess their capabilities in answering questions or solving problems on almost any possible topic or to test their ability to reason or interpret texts. Instead, the evaluation of the knowledge that these models have of the languages has received much less attention. For example, the words that they can recognize and use in different languages. In this paper, we evaluate the knowledge that open-source chat LLMs have of Spanish words by testing a sample of words in a reference dictionary. The results show that open-source chat LLMs produce incorrect meanings for an important fraction of the words and are not able to use most of the words correctly to write sentences with context. These results show how Spanish is left behind in the open-source LLM race and highlight the need to push for linguistic fairness in conversational LLMs ensuring that they provide similar performance across languages.

CVApr 19
Lost in the Vibrations: Vision Language Models Fail the Dynamic Gauges Test

Tairan Fu, Francisco Javier Santos-Martín, Javier Conde et al.

The digital transformation of industrial manufacturing increasingly relies on the ability of autonomous robots to interact with legacy infrastructure, particularly analog gauges. While Vision-Language Models (VLMs) have demonstrated potential in zero-shot instrument recognition, their deployment in measurement systems remains constrained by an inherent inability to accurately analyze high-frequency temporal events and needle vibrations. This paper evaluates state-of-the-art models, including GPT-5 and Gemini 3, against the strict requirements of metrology and uncertainty quantification. To facilitate this evaluation, we introduce a novel dataset comprising video sequences of various gauge types: circular, linear, and Vernier, under diverse motion speed profiles. Our findings indicate that current VLMs exhibit limited ability in interpreting needle trajectories and scale semantics, failing to provide the traceability and reliability needed for safety-critical monitoring. The results demonstrate that these models have not yet achieved the performance necessary to be classified as trustworthy synthetic instruments under existing IEEE and ISO standards.

CVJun 27, 2024
Recursive InPainting (RIP): how much information is lost under recursive inferences?

Javier Conde, Miguel González, Gonzalo Martínez et al.

The rapid adoption of generative artificial intelligence (AI) is accelerating content creation and modification. For example, variations of a given content, be it text or images, can be created almost instantly and at a low cost. This will soon lead to the majority of text and images being created directly by AI models or by humans assisted by AI. This poses new risks; for example, AI-generated content may be used to train newer AI models and degrade their performance, or information may be lost in the transformations made by AI which could occur when the same content is processed over and over again by AI tools. An example of AI image modifications is inpainting in which an AI model completes missing fragments of an image. The incorporation of inpainting tools into photo editing programs promotes their adoption and encourages their recursive use to modify images. Inpainting can be applied recursively, starting from an image, removing some parts, applying inpainting to reconstruct the image, revising it, and then starting the inpainting process again on the reconstructed image, etc. This paper presents an empirical evaluation of recursive inpainting when using one of the most widely used image models: Stable Diffusion. The inpainting process is applied by randomly selecting a fragment of the image, reconstructing it, selecting another fragment, and repeating the process a predefined number of iterations. The images used in the experiments are taken from a publicly available art data set and correspond to different styles and historical periods. Additionally, photographs are also evaluated as a reference. The modified images are compared with the original ones by both using quantitative metrics and performing a qualitative analysis. The results show that recursive inpainting in some cases modifies the image so that it still resembles the original one while in others leads to degeneration.