Oleg Smirnov

h-index22

7papers

2,279citations

Novelty30%

AI Score31

Ranked #131,143 of 194,257 authors (top 68%)#28,873 in LG (top 72%)

7 Papers

10.1CYJun 28

The Language You Ask In: Language-Conditioned Ideological Divergence in LLM Analysis of Contested Political Documents

Oleg Smirnov

Large language models are increasingly used to interpret politically contested questions, value-laden material on which there is no single correct answer, only competing interpretive traditions. We ask whether a model's choice among those traditions can turn on the language of the prompt rather than the content. Comparing two frontier models, ChatGPT 5.2 and Claude Opus 4.5, on one contested Ukrainian civil-society document under semantically matched Russian and Ukrainian prompts, we find that both shift along the same axis on identical source text: Russian prompts elicit delegitimizing readings of the document's authors and Ukrainian prompts legitimating ones. The magnitude is model-dependent but neither model is neutral: each adopts a language-dependent stance, and the difference is one of degree. Because contested political questions admit no correct reading against which to measure, we read this as language-conditioned variation in which interpretive tradition a model activates: the model neither holds a single stance nor surfaces the plurality of available ones, but silently adopts the dominant frame of the prompt's language. We draw out the consequences for pluralism-aware evaluation, which must probe the same content across the languages a model serves, and for pluralistic alignment in multilingual settings.

16.9LGFeb 7, 2025Code

On the Power of Heuristics in Temporal Graphs

Filip Cornell, Oleg Smirnov, Gabriela Zarzar Gandler et al.

Dynamic graph datasets often exhibit strong temporal patterns, such as recency, which prioritizes recent interactions, and popularity, which favors frequently occurring nodes. We demonstrate that simple heuristics leveraging only these patterns can perform on par or outperform state-of-the-art neural network models under standard evaluation protocols. To further explore these dynamics, we introduce metrics that quantify the impact of recency and popularity across datasets. Our experiments on BenchTemp and the Temporal Graph Benchmark show that our approaches achieve state-of-the-art performance across all datasets in the latter and secure top ranks on multiple datasets in the former. These results emphasize the importance of refined evaluation schemes to enable fair comparisons and promote the development of more robust temporal graph models. Additionally, they reveal that current deep learning methods often struggle to capture the key patterns underlying predictions in real-world temporal graphs. For reproducibility, we have made our code publicly available.

11.5LGApr 5, 2024

player2vec: A Language Modeling Approach to Understand Player Behavior in Games

Tianze Wang, Maryam Honari-Jahromi, Styliani Katsarou et al.

Methods for learning latent user representations from historical behavior logs have gained traction for recommendation tasks in e-commerce, content streaming, and other settings. However, this area still remains relatively underexplored in video and mobile gaming contexts. In this work, we present a novel method for overcoming this limitation by extending a long-range Transformer model from the natural language processing domain to player behavior data. We discuss specifics of behavior tracking in games and propose preprocessing and tokenization approaches by viewing in-game events in an analogous way to words in sentences, thus enabling learning player representations in a self-supervised manner in the absence of ground-truth annotations. We experimentally demonstrate the efficacy of the proposed approach in fitting the distribution of behavior events by evaluating intrinsic language modeling metrics. Furthermore, we qualitatively analyze the emerging structure of the learned embedding space and show its value for generating insights into behavior patterns to inform downstream applications.

4.6LGDec 5, 2024

Expressivity of Representation Learning on Continuous-Time Dynamic Graphs: An Information-Flow Centric Review

Sofiane Ennadir, Gabriela Zarzar Gandler, Filip Cornell et al.

Graphs are ubiquitous in real-world applications, ranging from social networks to biological systems, and have inspired the development of Graph Neural Networks (GNNs) for learning expressive representations. While most research has centered on static graphs, many real-world scenarios involve dynamic, temporally evolving graphs, motivating the need for Continuous-Time Dynamic Graph (CTDG) models. This paper provides a comprehensive review of Graph Representation Learning (GRL) on CTDGs with a focus on Self-Supervised Representation Learning (SSRL). We introduce a novel theoretical framework that analyzes the expressivity of CTDG models through an Information-Flow (IF) lens, quantifying their ability to propagate and encode temporal and structural information. Leveraging this framework, we categorize existing CTDG methods based on their suitability for different graph types and application scenarios. Within the same scope, we examine the design of SSRL methods tailored to CTDGs, such as predictive and contrastive approaches, highlighting their potential to mitigate the reliance on labeled data. Empirical evaluations on synthetic and real-world datasets validate our theoretical insights, demonstrating the strengths and limitations of various methods across long-range, bi-partite and community-based graphs. This work offers both a theoretical foundation and practical guidance for selecting and developing CTDG models, advancing the understanding of GRL in dynamic settings.

7.1LGJun 14, 2025

Are We Really Measuring Progress? Transferring Insights from Evaluating Recommender Systems to Temporal Link Prediction

Filip Cornell, Oleg Smirnov, Gabriela Zarzar Gandler et al.

Recent work has questioned the reliability of graph learning benchmarks, citing concerns around task design, methodological rigor, and data suitability. In this extended abstract, we contribute to this discussion by focusing on evaluation strategies in Temporal Link Prediction (TLP). We observe that current evaluation protocols are often affected by one or more of the following issues: (1) inconsistent sampled metrics, (2) reliance on hard negative sampling often introduced as a means to improve robustness, and (3) metrics that implicitly assume equal base probabilities across source nodes by combining predictions. We support these claims through illustrative examples and connections to longstanding concerns in the recommender systems community. Our ongoing work aims to systematically characterize these problems and explore alternatives that can lead to more robust and interpretable evaluation. We conclude with a discussion of potential directions for improving the reliability of TLP benchmarks.

7.1LGFeb 7, 2025

Unified Approaches in Self-Supervised Event Stream Modeling: Progress and Prospects

Levente Zólyomi, Tianze Wang, Sofiane Ennadir et al.

The proliferation of digital interactions across diverse domains, such as healthcare, e-commerce, gaming, and finance, has resulted in the generation of vast volumes of event stream (ES) data. ES data comprises continuous sequences of timestamped events that encapsulate detailed contextual information relevant to each domain. While ES data holds significant potential for extracting actionable insights and enhancing decision-making, its effective utilization is hindered by challenges such as the scarcity of labeled data and the fragmented nature of existing research efforts. Self-Supervised Learning (SSL) has emerged as a promising paradigm to address these challenges by enabling the extraction of meaningful representations from unlabeled ES data. In this survey, we systematically review and synthesize SSL methodologies tailored for ES modeling across multiple domains, bridging the gaps between domain-specific approaches that have traditionally operated in isolation. We present a comprehensive taxonomy of SSL techniques, encompassing both predictive and contrastive paradigms, and analyze their applicability and effectiveness within different application contexts. Furthermore, we identify critical gaps in current research and propose a future research agenda aimed at developing scalable, domain-agnostic SSL frameworks for ES modeling. By unifying disparate research efforts and highlighting cross-domain synergies, this survey aims to accelerate innovation, improve reproducibility, and expand the applicability of SSL to diverse real-world ES challenges.

22.4LGOct 24, 2024

Understanding Players as if They Are Talking to the Game in a Customized Language: A Pilot Study

Tianze Wang, Maryam Honari-Jahromi, Styliani Katsarou et al.

This pilot study explores the application of language models (LMs) to model game event sequences, treating them as a customized natural language. We investigate a popular mobile game, transforming raw event data into textual sequences and pretraining a Longformer model on this data. Our approach captures the rich and nuanced interactions within game sessions, effectively identifying meaningful player segments. The results demonstrate the potential of self-supervised LMs in enhancing game design and personalization without relying on ground-truth labels.