Lisa Bylinina

CL
h-index10
8papers
1,164citations
Novelty32%
AI Score34

8 Papers

CLJun 4, 2023
Leverage Points in Modality Shifts: Comparing Language-only and Multimodal Word Representations

Aleksey Tikhonov, Lisa Bylinina, Denis Paperno

Multimodal embeddings aim to enrich the semantic information in neural representations of language compared to text-only models. While different embeddings exhibit different applicability and performance on downstream tasks, little is known about the systematic representation differences attributed to the visual modality. Our paper compares word embeddings from three vision-and-language models (CLIP, OpenCLIP and Multilingual CLIP) and three text-only models, with static (FastText) as well as contextual representations (multilingual BERT; XLM-RoBERTa). This is the first large-scale study of the effect of visual grounding on language representations, including 46 semantic parameters. We identify meaning properties and relations that characterize words whose embeddings are most affected by the inclusion of visual modality in the training data; that is, points where visual grounding turns out most important. We find that the effect of visual modality correlates most with denotational semantic properties related to concreteness, but is also detected for several specific semantic classes, as well as for valence, a sentiment-related connotational property of linguistic expressions.

CLNov 3, 2023
Too Much Information: Keeping Training Simple for BabyLMs

Lukas Edman, Lisa Bylinina

This paper details the work of the University of Groningen for the BabyLM Challenge. We follow the idea that, like babies, language models should be introduced to simpler concepts first and build off of that knowledge to understand more complex concepts. We examine this strategy of simple-then-complex through a variety of lenses, namely context size, vocabulary, and overall linguistic complexity of the data. We find that only one, context size, is truly beneficial to training a language model. However this simple change to context size gives us improvements of 2 points on average on (Super)GLUE tasks, 1 point on MSGS tasks, and 12\% on average on BLiMP tasks. Our context-limited model outperforms the baseline that was trained on 10$\times$ the amount of data.

CLSep 27, 2024
Individuation in Neural Models with and without Visual Grounding

Alexey Tikhonov, Lisa Bylinina, Ivan P. Yamshchikov

We show differences between a language-and-vision model CLIP and two text-only models - FastText and SBERT - when it comes to the encoding of individuation information. We study latent representations that CLIP provides for substrates, granular aggregates, and various numbers of objects. We demonstrate that CLIP embeddings capture quantitative differences in individuation better than models trained on text-only data. Moreover, the individuation hierarchy we deduce from the CLIP embeddings agrees with the hierarchies proposed in linguistics and cognitive science.

CLJul 2, 2024
Black Big Boxes: Tracing Adjective Order Preferences in Large Language Models

Jaap Jumelet, Lisa Bylinina, Willem Zuidema et al.

In English and other languages, multiple adjectives in noun phrases follow intricate ordering patterns. These patterns have been widely studied in linguistics and provide a useful test case for assessing how language models (LMs) acquire graded and context-sensitive word order preferences. We ask to what extent adjective order preferences in LMs can be explained by distributional learning alone, and where models exhibit behaviour that goes beyond surface co-occurrence patterns. We find that LM predictions are largely explained by training data frequencies: simple n-gram statistics account for much of their behaviour and closely mirror the preferences learned during training. However, by analysing learning dynamics we reveal that models also generalize robustly to unseen adjective combinations, indicating that their behaviour cannot be reduced to memorization of observed orders alone. Moreover, we show how LMs leverage word order cues from sentence context, demonstrating with feature attribution methods that contextual cues are an additional driver of adjective order in LM output.

CLOct 2, 2025
Model Merging to Maintain Language-Only Performance in Developmentally Plausible Multimodal Models

Ece Takmaz, Lisa Bylinina, Jakub Dotlacil

State-of-the-art vision-and-language models consist of many parameters and learn from enormous datasets, surpassing the amounts of linguistic data that children are exposed to as they acquire a language. This paper presents our approach to the multimodal track of the BabyLM challenge addressing this discrepancy. We develop language-only and multimodal models in low-resource settings using developmentally plausible datasets, with our multimodal models outperforming previous BabyLM baselines. One finding in the multimodal language model literature is that these models tend to underperform in \textit{language-only} tasks. Therefore, we focus on maintaining language-only abilities in multimodal models. To this end, we experiment with \textit{model merging}, where we fuse the parameters of multimodal models with those of language-only models using weighted linear interpolation. Our results corroborate the findings that multimodal models underperform in language-only benchmarks that focus on grammar, and model merging with text-only models can help alleviate this problem to some extent, while maintaining multimodal performance.

CLOct 28, 2024
Are BabyLMs Second Language Learners?

Lukas Edman, Lisa Bylinina, Faeze Ghorbanpour et al.

This paper describes a linguistically-motivated approach to the 2024 edition of the BabyLM Challenge (Warstadt et al. 2023). Rather than pursuing a first language learning (L1) paradigm, we approach the challenge from a second language (L2) learning perspective. In L2 learning, there is a stronger focus on learning explicit linguistic information, such as grammatical notions, definitions of words or different ways of expressing a meaning. This makes L2 learning potentially more efficient and concise. We approximate this using data from Wiktionary, grammar examples either generated by an LLM or sourced from grammar books, and paraphrase data. We find that explicit information about word meaning (in our case, Wiktionary) does not boost model performance, while grammatical information can give a small improvement. The most impactful data ingredient is sentence paraphrases, with our two best models being trained on 1) a mix of paraphrase data and data from the BabyLM pretraining dataset, and 2) exclusively paraphrase data.

CLSep 13, 2021
Connecting degree and polarity: An artificial language learning study

Lisa Bylinina, Alexey Tikhonov, Ekaterina Garmash

We investigate a new linguistic generalization in pre-trained language models (taking BERT (Devlin et al., 2019) as a case study). We focus on degree modifiers (expressions like slightly, very, rather, extremely) and test the hypothesis that the degree expressed by a modifier (low, medium or high degree) is related to the modifier's sensitivity to sentence polarity (whether it shows preference for affirmative or negative sentences or neither). To probe this connection, we apply the Artificial Language Learning experimental paradigm from psycholinguistics to a neural language model. Our experimental results suggest that BERT generalizes in line with existing linguistic observations that relate degree semantics to polarity sensitivity, including the main one: low degree semantics is associated with preference towards positive polarity.

CLSep 8, 2021
Transformers in the loop: Polarity in neural models of language

Lisa Bylinina, Alexey Tikhonov

Representation of linguistic phenomena in computational language models is typically assessed against the predictions of existing linguistic theories of these phenomena. Using the notion of polarity as a case study, we show that this is not always the most adequate set-up. We probe polarity via so-called 'negative polarity items' (in particular, English 'any') in two pre-trained Transformer-based models (BERT and GPT-2). We show that - at least for polarity - metrics derived from language models are more consistent with data from psycholinguistic experiments than linguistic theory predictions. Establishing this allows us to more adequately evaluate the performance of language models and also to use language models to discover new insights into natural language grammar beyond existing linguistic theories. This work contributes to establishing closer ties between psycholinguistic experiments and experiments with language models.