Shalom Lappin

h-index31

6papers

4,031citations

Novelty31%

AI Score47

Ranked #33,856 of 194,257 authors (top 17%)#6,953 in CL (top 23%)

6 Papers

9.2CLMar 26Code

Humans vs Vision-Language Models: A Unified Measure of Narrative Coherence

Nikolai Ilinykh, Hyewon Jang, Shalom Lappin et al.

We study narrative coherence in visually grounded stories by comparing human-written narratives with those generated by vision-language models (VLMs) on the Visual Writing Prompts corpus. Using a set of metrics that capture different aspects of narrative coherence, including coreference, discourse relation types, topic continuity, character persistence, and multimodal character grounding, we compute a narrative coherence score. We find that VLMs show broadly similar coherence profiles that differ systematically from those of humans. In addition, differences for individual measures are often subtle, but they become clearer when considered jointly. Overall, our results indicate that, despite human-like surface fluency, model narratives exhibit systematic differences from those of humans in how they organise discourse across a visually grounded story. Our code is available at https://github.com/GU-CLASP/coherence-driven-humans.

1.1CLAug 11, 2022

Assessing the Unitary RNN as an End-to-End Compositional Model of Syntax

Jean-Philippe Bernardy, Shalom Lappin

We show that both an LSTM and a unitary-evolution recurrent neural network (URN) can achieve encouraging accuracy on two types of syntactic patterns: context-free long distance agreement, and mildly context-sensitive cross serial dependencies. This work extends recent experiments on deeply nested context-free long distance dependencies, with similar results. URNs differ from LSTMs in that they avoid non-linear activation functions, and they apply matrix multiplication to word embeddings encoded as unitary matrices. This permits them to retain all information in the processing of an input string over arbitrary distances. It also causes them to satisfy strict compositionality. URNs constitute a significant advance in the search for explainable models in deep learning applied to NLP.

2.4AIFeb 24

Predicting Sentence Acceptability Judgments in Multimodal Contexts

Hyewon Jang, Nikolai Ilinykh, Sharid Loáiciga et al.

Previous work has examined the capacity of deep neural networks (DNNs), particularly transformers, to predict human sentence acceptability judgments, both independently of context, and in document contexts. We consider the effect of prior exposure to visual images (i.e., visual context) on these judgments for humans and large language models (LLMs). Our results suggest that, in contrast to textual context, visual images appear to have little if any impact on human acceptability ratings. However, LLMs display the compression effect seen in previous work on human judgments in document contexts. Different sorts of LLMs are able to predict human acceptability judgments to a high degree of accuracy, but in general, their performance is slightly better when visual contexts are removed. Moreover, the distribution of LLM judgments varies among models, with Qwen resembling human patterns, and others diverging from them. LLM-generated predictions on sentence acceptability are highly correlated with their normalised log probabilities in general. However, the correlations decrease when visual contexts are present, suggesting that a higher gap exists between the internal representations of LLMs and their generated predictions in the presence of visual contexts. Our experimental work suggests interesting points of similarity and of difference between human and LLM processing of sentences in multimodal contexts.

13.0CLJun 19

LLM and Human Modes of Representation

Shalom Lappin

Much work on the cognitive foundations of AI has focussed on comparisons between the ways in which Large Language Models (LLMs) and humans process information and represent it. One aspect of this comparison involves determining the extent to which LLMs can achieve or surpass human performance on a variety of cognitively interesting tasks. A second explores points of convergence and divergence between LLM and human systems for processing information. Here, I consider some recent research that has addressed both issues in two informational domains. The first is the representation of linguistic knowledge. The second is real world reasoning and planning. While LLMs frequently achieve impressive levels of performance and fluency on linguistic applications, they tend to handle linguistic content in ways that are distinct from human processing. They are also, for the most part, less efficient than humans in learning and generalisation for reasoning tasks.

3.6CLApr 2, 2020Code

How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability in Context

Jey Han Lau, Carlos S. Armendariz, Shalom Lappin et al.

We study the influence of context on sentence acceptability. First we compare the acceptability ratings of sentences judged in isolation, with a relevant context, and with an irrelevant context. Our results show that context induces a cognitive load for humans, which compresses the distribution of ratings. Moreover, in relevant contexts we observe a discourse coherence effect which uniformly raises acceptability. Next, we test unidirectional and bidirectional language models in their ability to predict acceptability ratings. The bidirectional models show very promising results, with the best model achieving a new state-of-the-art for unsupervised acceptability prediction. The two sets of experiments provide insights into the cognitive aspects of sentence processing and central issues in the computational modelling of text and discourse.

32.0CLSep 4, 2018Code

The Effect of Context on Metaphor Paraphrase Aptness Judgments

Yuri Bizzoni, Shalom Lappin

We conduct two experiments to study the effect of context on metaphor paraphrase aptness judgments. The first is an AMT crowd source task in which speakers rank metaphor paraphrase candidate sentence pairs in short document contexts for paraphrase aptness. In the second we train a composite DNN to predict these human judgments, first in binary classifier mode, and then as gradient ratings. We found that for both mean human judgments and our DNN's predictions, adding document context compresses the aptness scores towards the center of the scale, raising low out of context ratings and decreasing high out of context scores. We offer a provisional explanation for this compression effect.