Shuhua Yin

h-index5
2papers

2 Papers

13.5NAApr 21
Toward Practical Forecasts of Public Sentiments via Convexification for Mean Field Games: Evidence from Real World COVID-19 Discussion Data

Shi Chen, Michael V. Klibanov, Kevin McGoff et al.

We apply a convexification-based numerical method to forecast public sentiment dynamics using Mean Field Games (MFGs). The theoretical foundation for the convexification approach, established in our prior work, guarantees global convergence to the unique solution to the MFG system. The present work demonstrates the practical potential of this framework using real-world sentiment data extracted from social media public discussion during the COVID-19 pandemic. The results show that the MFG model with appropriate parameters and convexification yields sentiment density predictions that align closely with observed data and satisfy the governing equations. While current parameter selection relies on manual calibration, our findings establish the first proof-of-concept evidence that MFG models can capture complex temporal patterns in public sentiment, laying the groundwork for future work on systematic parameter identification methods, i.e. solutions of coefficient inverse problems for the MFG system.

CLOct 21, 2025
Improving Topic Modeling of Social Media Short Texts with Rephrasing: A Case Study of COVID-19 Related Tweets

Wangjiaxuan Xin, Shuhua Yin, Shi Chen et al.

Social media platforms such as Twitter (now X) provide rich data for analyzing public discourse, especially during crises such as the COVID-19 pandemic. However, the brevity, informality, and noise of social media short texts often hinder the effectiveness of traditional topic modeling, producing incoherent or redundant topics that are often difficult to interpret. To address these challenges, we have developed \emph{TM-Rephrase}, a model-agnostic framework that leverages large language models (LLMs) to rephrase raw tweets into more standardized and formal language prior to topic modeling. Using a dataset of 25,027 COVID-19-related Twitter posts, we investigate the effects of two rephrasing strategies, general- and colloquial-to-formal-rephrasing, on multiple topic modeling methods. Results demonstrate that \emph{TM-Rephrase} improves three metrics measuring topic modeling performance (i.e., topic coherence, topic uniqueness, and topic diversity) while reducing topic redundancy of most topic modeling algorithms, with the colloquial-to-formal strategy yielding the greatest performance gains and especially for the Latent Dirichlet Allocation (LDA) algorithm. This study contributes to a model-agnostic approach to enhancing topic modeling in public health related social media analysis, with broad implications for improved understanding of public discourse in health crisis as well as other important domains.