CLAIJan 1, 2025

Embedding Style Beyond Topics: Analyzing Dispersion Effects Across Different Language Models

arXiv:2501.00828v121 citationsh-index: 3COLING
Originality Synthesis-oriented
AI Analysis

It addresses the problem of understanding stylistic processing in language models for researchers, but it is incremental as it builds on existing embedding analysis.

This paper analyzed how writing style affects embedding vector dispersion across state-of-the-art language models, finding that models are sensitive to stylistic variations in a literary corpus across French and English.

This paper analyzes how writing style affects the dispersion of embedding vectors across multiple, state-of-the-art language models. While early transformer models primarily aligned with topic modeling, this study examines the role of writing style in shaping embedding spaces. Using a literary corpus that alternates between topics and styles, we compare the sensitivity of language models across French and English. By analyzing the particular impact of style on embedding dispersion, we aim to better understand how language models process stylistic information, contributing to their overall interpretability.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes