LGAICYHCOct 8, 2025

Investigating Thematic Patterns and User Preferences in LLM Interactions using BERTopic

arXiv:2510.07557v12 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This provides insights for improving LLM performance in specific domains, though it is incremental as it applies an existing method to new conversational data.

This study applied BERTopic to analyze the lmsys-chat-1m dataset of LLM conversations, identifying 29 coherent topics like AI and programming, and examined how user preferences for different LLMs correlate with these topics to inform model optimization.

This study applies BERTopic, a transformer-based topic modeling technique, to the lmsys-chat-1m dataset, a multilingual conversational corpus built from head-to-head evaluations of large language models (LLMs). Each user prompt is paired with two anonymized LLM responses and a human preference label, used to assess user evaluation of competing model outputs. The main objective is uncovering thematic patterns in these conversations and examining their relation to user preferences, particularly if certain LLMs are consistently preferred within specific topics. A robust preprocessing pipeline was designed for multilingual variation, balancing dialogue turns, and cleaning noisy or redacted data. BERTopic extracted over 29 coherent topics including artificial intelligence, programming, ethics, and cloud infrastructure. We analysed relationships between topics and model preferences to identify trends in model-topic alignment. Visualization techniques included inter-topic distance maps, topic probability distributions, and model-versus-topic matrices. Our findings inform domain-specific fine-tuning and optimization strategies for improving real-world LLM performance and user satisfaction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes