Jan Vávra

ME
h-index8
3papers
7citations
Novelty52%
AI Score26

3 Papers

MEOct 14, 2024
A Structural Text-Based Scaling Model for Analyzing Political Discourse

Jan Vávra, Bernd Hans-Konrad Prostmaier, Bettina Grün et al.

Scaling political actors based on their individual characteristics and behavior helps profiling and grouping them as well as understanding changes in the political landscape. In this paper we introduce the Structural Text-Based Scaling (STBS) model to infer ideological positions of speakers for latent topics from text data. We expand the usual Poisson factorization specification for topic modeling of text data and use flexible shrinkage priors to induce sparsity and enhance interpretability. We also incorporate speaker-specific covariates to assess their association with ideological positions. Applying STBS to U.S. Senate speeches from Congress session 114, we identify immigration and gun violence as the most polarizing topics between the two major parties in Congress. Additionally, we find that, in discussions about abortion, the gender of the speaker significantly influences their position, with female speakers focusing more on women's health. We also see that a speaker's region of origin influences their ideological position more than their religious affiliation.

MEOct 24, 2024
Evolving Voices Based on Temporal Poisson Factorisation

Jan Vávra, Bettina Grün, Paul Hofmarcher

The world is evolving and so is the vocabulary used to discuss topics in speech. Analysing political speech data from more than 30 years requires the use of flexible topic models to uncover the latent topics and their change in prevalence over time as well as the change in the vocabulary of the topics. We propose the temporal Poisson factorisation (TPF) model as an extension to the Poisson factorisation model to model sparse count data matrices obtained based on the bag-of-words assumption from text documents with time stamps. We discuss and empirically compare different model specifications for the time-varying latent variables consisting either of a flexible auto-regressive structure of order one or a random walk. Estimation is based on variational inference where we consider a combination of coordinate ascent updates with automatic differentiation using batching of documents. Suitable variational families are proposed to ease inference. We compare results obtained using independent univariate variational distributions for the time-varying latent variables to those obtained with a multivariate variant. We discuss in detail the results of the TPF model when analysing speeches from 18 sessions in the U.S. Senate (1981-2016).

MEMar 4, 2025
Seeded Poisson Factorization: leveraging domain knowledge to fit topic models

Bernd Prostmaier, Jan Vávra, Bettina Grün et al.

Topic models are widely used for discovering latent thematic structures in large text corpora, yet traditional unsupervised methods often struggle to align with pre-defined conceptual domains. This paper introduces seeded Poisson Factorization (SPF), a novel approach that extends the Poisson Factorization (PF) framework by incorporating domain knowledge through seed words. SPF enables a structured topic discovery by modifying the prior distribution of topic-specific term intensities, assigning higher initial rates to pre-defined seed words. The model is estimated using variational inference with stochastic gradient optimization, ensuring scalability to large datasets. We present in detail the results of applying SPF to an Amazon customer feedback dataset, leveraging pre-defined product categories as guiding structures. SPF achieves superior performance compared to alternative guided probabilistic topic models in terms of computational efficiency and classification performance. Robustness checks highlight SPF's ability to adaptively balance domain knowledge and data-driven topic discovery, even in case of imperfect seed word selection. Further applications of SPF to four additional benchmark datasets, where the corpus varies in size and the number of topics differs, demonstrate its general superior classification performance compared to the unseeded PF model.