CL CY SIMar 16

POLAR:A Per-User Association Test in Embedding Space

Pedro Bento, Arthur Buzelin, Arthur Chagas, Yan Aquino, Victoria Estanislau, Samira Malaquias, Pedro Robles Dutenhefner, Gisele L. Pappa, Virgilio Almeida, Wagner MeiraJr

arXiv:2603.1595073.2h-index: 3Has Code

AI Analysis

This provides per-author diagnostic tools for computational social science, addressing a gap in analyzing individual-level variation in language models.

The paper tackles the problem of measuring lexical associations at the author level, which is obscured by existing word- or corpus-level probes, and presents POLAR, a method that cleanly separates LLM-driven bots from organic accounts on Twitter and quantifies alignment with slur lexicons on an extremist forum.

Most intrinsic association probes operate at the word, sentence, or corpus level, obscuring author-level variation. We present POLAR (Per-user On-axis Lexical Association Re-port), a per-user lexical association test that runs in the embedding space of a lightly adapted masked language model. Authors are represented by private deterministic to-kens; POLAR projects these vectors onto curated lexicalaxes and reports standardized effects with permutation p-values and Benjamini--Hochberg control. On a balanced bot--human Twitter benchmark, POLAR cleanly separates LLM-driven bots from organic accounts; on an extremist forum,it quantifies strong alignment with slur lexicons and reveals rightward drift over time. The method is modular to new attribute sets and provides concise, per-author diagnostics for computational social science. All code is publicly avail-able at https://github.com/pedroaugtb/POLAR-A-Per-User-Association-Test-in-Embedding-Space.

View on arXiv PDF Code

Similar