CLMar 13

Interpretable Semantic Gradients in SSD: A PCA Sweep Approach and a Case Study on AI Discourse

Hubert Plisiecki, Maria Leniarska, Jan Piotrowski, Marcin Zajenkowski

arXiv:2603.1303810.12 citationsh-index: 27

AI Analysis

This work addresses a methodological gap for researchers using SSD in computational social science or psychology, offering a more transparent and constrained approach to dimensionality selection, though it is incremental as it builds on existing SSD methods.

The paper tackled the problem of selecting the number of retained components in Supervised Semantic Differential (SSD) analysis, which lacked a systematic method and introduced researcher degrees of freedom, by proposing a PCA sweep procedure that balances representation capacity, interpretability, and stability. The result, applied to AI discourse data, yielded a stable, interpretable gradient for Admiration narcissism, contrasting optimistic vs. distrustful framings, while no robust alignment was found for Rivalry, and a counterfactual high-dimension heuristic produced diffuse clusters.

Supervised Semantic Differential (SSD) is a mixed quantitative-interpretive method that models how text meaning varies with continuous individual-difference variables by estimating a semantic gradient in an embedding space and interpreting its poles through clustering and text retrieval. SSD applies PCA before regression, but currently no systematic method exists for choosing the number of retained components, introducing avoidable researcher degrees of freedom in the analysis pipeline. We propose a PCA sweep procedure that treats dimensionality selection as a joint criterion over representation capacity, gradient interpretability, and stability across nearby values of K. We illustrate the method on a corpus of short posts about artificial intelligence written by Prolific participants who also completed Admiration and Rivalry narcissism scales. The sweep yields a stable, interpretable Admiration-related gradient contrasting optimistic, collaborative framings of AI with distrustful and derisive discourse, while no robust alignment emerges for Rivalry. We also show that a counterfactual using a high-PCA dimension solution heuristic produces diffuse, weakly structured clusters instead, reinforcing the value of the sweep-based choice of K. The case study shows how the PCA sweep constrains researcher degrees of freedom while preserving SSD's interpretive aims, supporting transparent and psychologically meaningful analyses of connotative meaning.

View on arXiv PDF

Similar