Jeremy I Skipper

AIAug 11, 2023

Multimodality and Attention Increase Alignment in Natural Language Prediction Between Humans and Computational Models

Viktor Kewenig, Andrew Lampinen, Samuel A. Nastase et al.

The potential of multimodal generative artificial intelligence (mAI) to replicate human grounded language understanding, including the pragmatic, context-rich aspects of communication, remains to be clarified. Humans are known to use salient multimodal features, such as visual cues, to facilitate the processing of upcoming words. Correspondingly, multimodal computational models can integrate visual and linguistic data using a visual attention mechanism to assign next-word probabilities. To test whether these processes align, we tasked both human participants (N = 200) as well as several state-of-the-art computational models with evaluating the predictability of forthcoming words after viewing short audio-only or audio-visual clips with speech. During the task, the model's attention weights were recorded and human attention was indexed via eye tracking. Results show that predictability estimates from humans aligned more closely with scores generated from multimodal models vs. their unimodal counterparts. Furthermore, including an attention mechanism doubled alignment with human judgments when visual and linguistic context facilitated predictions. In these cases, the model's attention patches and human eye tracking significantly overlapped. Our results indicate that improved modeling of naturalistic language processing in mAI does not merely depend on training diet but can be driven by multimodality in combination with attention-based architectures. Humans and computational models alike can leverage the predictive constraints of multimodal information by attending to relevant features in the input.

NCSep 30, 2024

The age of spiritual machines: Language quietus induces synthetic altered states of consciousness in artificial intelligence

Jeremy I Skipper, Joanna Kuc, Greg Cooper et al.

How is language related to consciousness? Language functions to categorise perceptual experiences (e.g., labelling interoceptive states as 'happy') and higher-level constructs (e.g., using 'I' to represent the narrative self). Psychedelic use and meditation might be described as altered states that impair or intentionally modify the capacity for linguistic categorisation. For example, psychedelic phenomenology is often characterised by 'oceanic boundlessness' or 'unity' and 'ego dissolution', which might be expected of a system unburdened by entrenched language categories. If language breakdown plays a role in producing such altered behaviour, multimodal artificial intelligence might align more with these phenomenological descriptions when attention is shifted away from language. We tested this hypothesis by comparing the semantic embedding spaces from simulated altered states after manipulating attentional weights in CLIP and FLAVA models to embedding spaces from altered states questionnaires before manipulation. Compared to random text and various other altered states including anxiety, models were more aligned with disembodied, ego-less, spiritual, and unitive states, as well as minimal phenomenal experiences, with decreased attention to language and vision. Reduced attention to language was associated with distinct linguistic patterns and blurred embeddings within and, especially, across semantic categories (e.g., 'giraffes' become more like 'bananas'). These results lend support to the role of language categorisation in the phenomenology of altered states of consciousness, like those experienced with high doses of psychedelics or concentration meditation, states that often lead to improved mental health and wellbeing.

Jeremy I Skipper

2 Papers