Georgios N. Yannakakis

HC
h-index69
59papers
1,309citations
Novelty38%
AI Score53

59 Papers

CLMar 13, 2023Code
Architext: Language-Driven Generative Architecture Design

Theodoros Galanos, Antonios Liapis, Georgios N. Yannakakis

Architectural design is a highly complex practice that involves a wide diversity of disciplines, technologies, proprietary design software, expertise, and an almost infinite number of constraints, across a vast array of design tasks. Enabling intuitive, accessible, and scalable design processes is an important step towards performance-driven and sustainable design for all. To that end, we introduce Architext, a novel semantic generation assistive tool. Architext enables design generation with only natural language prompts, given to large-scale Language Models, as input. We conduct a thorough quantitative evaluation of Architext's downstream task performance, focusing on semantic accuracy and diversity for a number of pre-trained language models ranging from 120 million to 6 billion parameters. Architext models are able to learn the specific design task, generating valid residential layouts at a near 100% rate. Accuracy shows great improvement when scaling the models, with the largest model (GPT-J) yielding impressive accuracy ranging between 25% to over 80% for different prompt categories. We open source the finetuned Architext models and our synthetic dataset, hoping to inspire experimentation in this exciting area of design research.

AIAug 26, 2022
Generative Personas That Behave and Experience Like Humans

Matthew Barthet, Ahmed Khalifa, Antonios Liapis et al.

Using artificial intelligence (AI) to automatically test a game remains a critical challenge for the development of richer and more complex game worlds and for the advancement of AI at large. One of the most promising methods for achieving that long-standing goal is the use of generative AI agents, namely procedural personas, that attempt to imitate particular playing behaviors which are represented as rules, rewards, or human demonstrations. All research efforts for building those generative agents, however, have focused solely on playing behavior which is arguably a narrow perspective of what a player actually does in a game. Motivated by this gap in the existing state of the art, in this paper we extend the notion of behavioral procedural personas to cater for player experience, thus examining generative agents that can both behave and experience their game as humans would. For that purpose, we employ the Go-Explore reinforcement learning paradigm for training human-like procedural personas, and we test our method on behavior and experience demonstrations of more than 100 players of a racing game. Our findings suggest that the generated agents exhibit distinctive play styles and experience responses of the human personas they were designed to imitate. Importantly, it also appears that experience, which is tied to playing behavior, can be a highly informative driver for better behavioral exploration.

HCSep 25, 2023
Affective Game Computing: A Survey

Georgios N. Yannakakis, David Melhart

This paper surveys the current state of the art in affective computing principles, methods and tools as applied to games. We review this emerging field, namely affective game computing, through the lens of the four core phases of the affective loop: game affect elicitation, game affect sensing, game affect detection and game affect adaptation. In addition, we provide a taxonomy of terms, methods and approaches used across the four phases of the affective game loop and situate the field within this taxonomy. We continue with a comprehensive review of available affect data collection methods with regards to gaming interfaces, sensors, annotation protocols, and available corpora. The paper concludes with a discussion on the current limitations of affective game computing and our vision for the most promising future research directions in the field.

LGSep 7, 2022
Open-Ended Evolution for Minecraft Building Generation

Matthew Barthet, Antonios Liapis, Georgios N. Yannakakis

This paper proposes a procedural content generator which evolves Minecraft buildings according to an open-ended and intrinsic definition of novelty. To realize this goal we evaluate individuals' novelty in the latent space using a 3D autoencoder, and alternate between phases of exploration and transformation. During exploration the system evolves multiple populations of CPPNs through CPPN-NEAT and constrained novelty search in the latent space (defined by the current autoencoder). We apply a set of repair and constraint functions to ensure candidates adhere to basic structural rules and constraints during evolution. During transformation, we reshape the boundaries of the latent space to identify new interesting areas of the solution space by retraining the autoencoder with novel content. In this study we evaluate five different approaches for training the autoencoder during transformation and its impact on populations' quality and diversity during evolution. Our results show that by retraining the autoencoder we can achieve better open-ended complexity compared to a static model, which is further improved when retraining using larger datasets of individuals with diverse complexities.

HCAug 25, 2022
Supervised Contrastive Learning for Affect Modelling

Kosmas Pinitas, Konstantinos Makantasis, Antonios Liapis et al.

Affect modeling is viewed, traditionally, as the process of mapping measurable affect manifestations from multiple modalities of user input to affect labels. That mapping is usually inferred through end-to-end (manifestation-to-affect) machine learning processes. What if, instead, one trains general, subject-invariant representations that consider affect information and then uses such representations to model affect? In this paper we assume that affect labels form an integral part, and not just the training signal, of an affect representation and we explore how the recent paradigm of contrastive learning can be employed to discover general high-level affect-infused representations for the purpose of modeling affect. We introduce three different supervised contrastive learning approaches for training representations that consider affect information. In this initial study we test the proposed methods for arousal prediction in the RECOLA dataset based on user information from multiple modalities. Results demonstrate the representation capacity of contrastive learning and its efficiency in boosting the accuracy of affect models. Beyond their evidenced higher performance compared to end-to-end arousal classification, the resulting representations are general-purpose and subject-agnostic, as training is guided though general affect information available in any multimodal corpus.

LGAug 26, 2022
Play with Emotion: Affect-Driven Reinforcement Learning

Matthew Barthet, Ahmed Khalifa, Antonios Liapis et al.

This paper introduces a paradigm shift by viewing the task of affect modeling as a reinforcement learning (RL) process. According to the proposed paradigm, RL agents learn a policy (i.e. affective interaction) by attempting to maximize a set of rewards (i.e. behavioral and affective patterns) via their experience with their environment (i.e. context). Our hypothesis is that RL is an effective paradigm for interweaving affect elicitation and manifestation with behavioral and affective demonstrations. Importantly, our second hypothesis-building on Damasio's somatic marker hypothesis-is that emotion can be the facilitator of decision-making. We test our hypotheses in a racing game by training Go-Blend agents to model human demonstrations of arousal and behavior; Go-Blend is a modified version of the Go-Explore algorithm which has recently showcased supreme performance in hard exploration tasks. We first vary the arousal-based reward function and observe agents that can effectively display a palette of affect and behavioral patterns according to the specified reward. Then we use arousal-based state selection mechanisms in order to bias the strategies that Go-Blend explores. Our findings suggest that Go-Blend not only is an efficient affect modeling paradigm but, more importantly, affect-driven RL improves exploration and yields higher performing agents, validating Damasio's hypothesis in the domain of games.

CVJun 13, 2022
Learning Task-Independent Game State Representations from Unlabeled Images

Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis et al.

Self-supervised learning (SSL) techniques have been widely used to learn compact and informative representations from high-dimensional complex data. In many computer vision tasks, such as image classification, such methods achieve state-of-the-art results that surpass supervised learning approaches. In this paper, we investigate whether SSL methods can be leveraged for the task of learning accurate state representations of games, and if so, to what extent. For this purpose, we collect game footage frames and corresponding sequences of games' internal state from three different 3D games: VizDoom, the CARLA racing simulator and the Google Research Football Environment. We train an image encoder with three widely used SSL algorithms using solely the raw frames, and then attempt to recover the internal state variables from the learned representations. Our results across all three games showcase significantly higher correlation between SSL representations and the game's internal state compared to pre-trained baseline models such as ImageNet. Such findings suggest that SSL-based visual encoders can yield general -- not tailored to a specific task -- yet informative game representations solely from game pixel information. Such representations can, in turn, form the basis for boosting the performance of downstream learning tasks in games, including gameplaying, content generation and player modeling.

NEApr 14, 2022
RankNEAT: Outperforming Stochastic Gradient Search in Preference Learning Tasks

Kosmas Pinitas, Konstantinos Makantasis, Antonios Liapis et al.

Stochastic gradient descent (SGD) is a premium optimization method for training neural networks, especially for learning objectively defined labels such as image objects and events. When a neural network is instead faced with subjectively defined labels--such as human demonstrations or annotations--SGD may struggle to explore the deceptive and noisy loss landscapes caused by the inherent bias and subjectivity of humans. While neural networks are often trained via preference learning algorithms in an effort to eliminate such data noise, the de facto training methods rely on gradient descent. Motivated by the lack of empirical studies on the impact of evolutionary search to the training of preference learners, we introduce the RankNEAT algorithm which learns to rank through neuroevolution of augmenting topologies. We test the hypothesis that RankNEAT outperforms traditional gradient-based preference learning within the affective computing domain, in particular predicting annotated player arousal from the game footage of three dissimilar games. RankNEAT yields superior performances compared to the gradient-based preference learner (RankNet) in the majority of experiments since its architecture optimization capacity acts as an efficient feature selection mechanism, thereby, eliminating overfitting. Results suggest that RankNEAT is a viable and highly efficient evolutionary alternative to preference learning.

AIOct 14, 2022
The Invariant Ground Truth of Affect

Konstantinos Makantasis, Kosmas Pinitas, Antonios Liapis et al.

Affective computing strives to unveil the unknown relationship between affect elicitation, manifestation of affect and affect annotations. The ground truth of affect, however, is predominately attributed to the affect labels which inadvertently include biases inherent to the subjective nature of emotion and its labeling. The response to such limitations is usually augmenting the dataset with more annotations per data point; however, this is not possible when we are interested in self-reports via first-person annotation. Moreover, outlier detection methods based on inter-annotator agreement only consider the annotations themselves and ignore the context and the corresponding affect manifestation. This paper reframes the ways one may obtain a reliable ground truth of affect by transferring aspects of causation theory to affective computing. In particular, we assume that the ground truth of affect can be found in the causal relationships between elicitation, manifestation and annotation that remain \emph{invariant} across tasks and participants. To test our assumption we employ causation inspired methods for detecting outliers in affective corpora and building affect models that are robust across participants and tasks. We validate our methodology within the domain of digital games, with experimental results showing that it can successfully detect outliers and boost the accuracy of affect models. To the best of our knowledge, this study presents the first attempt to integrate causation tools in affective computing, making a crucial and decisive step towards general affect modeling.

AIMay 2, 2022
Seeding Diversity into AI Art

Marvin Zammit, Antonios Liapis, Georgios N. Yannakakis

This paper argues that generative art driven by conformance to a visual and/or semantic corpus lacks the necessary criteria to be considered creative. Among several issues identified in the literature, we focus on the fact that generative adversarial networks (GANs) that create a single image, in a vacuum, lack a concept of novelty regarding how their product differs from previously created ones. We envision that an algorithm that combines the novelty preservation mechanisms in evolutionary algorithms with the power of GANs can deliberately guide its creative process towards output that is both good and novel. In this paper, we use recent advances in image generation based on semantic prompts using OpenAI's CLIP model, interrupting the GAN's iterative process with short cycles of evolutionary divergent search. The results of evolution are then used to continue the GAN's iterative process; we hypothesise that this intervention will lead to more novel outputs. Testing our hypothesis using novelty search with local competition, a quality-diversity evolutionary algorithm that can increase visual diversity while maintaining quality in the form of adherence to the semantic prompt, we explore how different notions of visual diversity can affect both the process and the product of the algorithm. Results show that even a simplistic measure of visual diversity can help counter a drift towards similar images caused by the GAN. This first experiment opens a new direction for introducing higher intentionality and a more nuanced drive for GANs.

LGAug 3, 2023
Lode Enhancer: Level Co-creation Through Scaling

Debosmita Bhaumik, Julian Togelius, Georgios N. Yannakakis et al.

We explore AI-powered upscaling as a design assistance tool in the context of creating 2D game levels. Deep neural networks are used to upscale artificially downscaled patches of levels from the puzzle platformer game Lode Runner. The trained networks are incorporated into a web-based editor, where the user can create and edit levels at three different levels of resolution: 4x4, 8x8, and 16x16. An edit at any resolution instantly transfers to the other resolutions. As upscaling requires inventing features that might not be present at lower resolutions, we train neural networks to reproduce these features. We introduce a neural network architecture that is capable of not only learning upscaling but also giving higher priority to less frequent tiles. To investigate the potential of this tool and guide further development, we conduct a qualitative study with 3 designers to understand how they use it. Designers enjoyed co-designing with the tool, liked its underlying concept, and provided feedback for further improvement.

HCSep 19, 2024
Across-Game Engagement Modelling via Few-Shot Learning

Kosmas Pinitas, Konstantinos Makantasis, Georgios N. Yannakakis

Domain generalisation involves learning artificial intelligence (AI) models that can maintain high performance across diverse domains within a specific task. In video games, for instance, such AI models can supposedly learn to detect player actions across different games. Despite recent advancements in AI, domain generalisation for modelling the users' experience remains largely unexplored. While video games present unique challenges and opportunities for the analysis of user experience -- due to their dynamic and rich contextual nature -- modelling such experiences is limited by generally small datasets. As a result, conventional modelling methods often struggle to bridge the domain gap between users and games due to their reliance on large labelled training data and assumptions of common distributions of user experience. In this paper, we tackle this challenge by introducing a framework that decomposes the general domain-agnostic modelling of user experience into several domain-specific and game-dependent tasks that can be solved via few-shot learning. We test our framework on a variation of the publicly available GameVibe corpus, designed specifically to test a model's ability to predict user engagement across different first-person shooter games. Our findings demonstrate the superior performance of few-shot learners over traditional modelling methods and thus showcase the potential of few-shot learning for robust experience modelling in video games and beyond.

CVJul 20, 2023
Towards General Game Representations: Decomposing Games Pixels into Content and Style

Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis et al.

On-screen game footage contains rich contextual information that players process when playing and experiencing a game. Learning pixel representations of games can benefit artificial intelligence across several downstream tasks including game-playing agents, procedural content generation, and player modelling. The generalizability of these methods, however, remains a challenge, as learned representations should ideally be shared across games with similar game mechanics. This could allow, for instance, game-playing agents trained on one game to perform well in similar games with no re-training. This paper explores how generalizable pre-trained computer vision encoders can be for such tasks, by decomposing the latent space into content embeddings and style embeddings. The goal is to minimize the domain gap between games of the same genre when it comes to game content critical for downstream tasks, and ignore differences in graphical style. We employ a pre-trained Vision Transformer encoder and a decomposition technique based on game genres to obtain separate content and style embeddings. Our findings show that the decomposed embeddings achieve style invariance across multiple games while still maintaining strong content extraction capabilities. We argue that the proposed decomposition of content and style offers better generalization capacities across game environments independently of the downstream task.

CVJul 4, 2022
Game State Learning via Game Scene Augmentation

Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis et al.

Having access to accurate game state information is of utmost importance for any artificial intelligence task including game-playing, testing, player modeling, and procedural content generation. Self-Supervised Learning (SSL) techniques have shown to be capable of inferring accurate game state information from the high-dimensional pixel input of game footage into compressed latent representations. Contrastive Learning is a popular SSL paradigm where the visual understanding of the game's images comes from contrasting dissimilar and similar game states defined by simple image augmentation methods. In this study, we introduce a new game scene augmentation technique -- named GameCLR -- that takes advantage of the game-engine to define and synthesize specific, highly-controlled renderings of different game states, thereby, boosting contrastive learning performance. We test our GameCLR technique on images of the CARLA driving simulator environment and compare it against the popular SimCLR baseline SSL method. Our results suggest that GameCLR can infer the game's state information from game footage more accurately compared to the baseline. Our proposed approach allows us to conduct game artificial intelligence research by directly utilizing screen pixels as input.

AIJul 25, 2024
Affectively Framework: Towards Human-like Affect-Based Agents

Matthew Barthet, Roberto Gallotta, Ahmed Khalifa et al.

Game environments offer a unique opportunity for training virtual agents due to their interactive nature, which provides diverse play traces and affect labels. Despite their potential, no reinforcement learning framework incorporates human affect models as part of their observation space or reward mechanism. To address this, we present the \emph{Affectively Framework}, a set of Open-AI Gym environments that integrate affect as part of the observation space. This paper introduces the framework and its three game environments and provides baseline experiments to validate its effectiveness and potential.

HCJul 23, 2024
Closing the Affective Loop via Experience-Driven Reinforcement Learning Designers

Matthew Barthet, Diogo Branco, Roberto Gallotta et al.

Autonomously tailoring content to a set of predetermined affective patterns has long been considered the holy grail of affect-aware human-computer interaction at large. The experience-driven procedural content generation framework realises this vision by searching for content that elicits a certain experience pattern to a user. In this paper, we propose a novel reinforcement learning (RL) framework for generating affect-tailored content, and we test it in the domain of racing games. Specifically, the experience-driven RL (EDRL) framework is given a target arousal trace, and it then generates a racetrack that elicits the desired affective responses for a particular type of player. EDRL leverages a reward function that assesses the affective pattern of any generated racetrack from a corpus of arousal traces. Our findings suggest that EDRL can accurately generate affect-driven racing game levels according to a designer's style and outperforms search-based methods for personalised content generation. The method is not only directly applicable to game content generation tasks but also employable broadly to any domain that uses content for affective adaptation.

LGJun 20, 2022
Revisiting lp-constrained Softmax Loss: A Comprehensive Study

Chintan Trivedi, Konstantinos Makantasis, Antonios Liapis et al.

Normalization is a vital process for any machine learning task as it controls the properties of data and affects model performance at large. The impact of particular forms of normalization, however, has so far been investigated in limited domain-specific classification tasks and not in a general fashion. Motivated by the lack of such a comprehensive study, in this paper we investigate the performance of lp-constrained softmax loss classifiers across different norm orders, magnitudes, and data dimensions in both proof-of-concept classification problems and real-world popular image classification tasks. Experimental results suggest collectively that lp-constrained softmax loss classifiers not only can achieve more accurate classification results but, at the same time, appear to be less prone to overfitting. The core findings hold across the three popular deep learning architectures tested and eight datasets examined, and suggest that lp normalization is a recommended data representation practice for image classification in terms of performance and convergence, and against overfitting.

AIJan 20
PREFAB: PREFerence-based Affective Modeling for Low-Budget Self-Annotation

Jaeyoung Moon, Youjin Choi, Yucheon Park et al.

Self-annotation is the gold standard for collecting affective state labels in affective computing. Existing methods typically rely on full annotation, requiring users to continuously label affective states across entire sessions. While this process yields fine-grained data, it is time-consuming, cognitively demanding, and prone to fatigue and errors. To address these issues, we present PREFAB, a low-budget retrospective self-annotation method that targets affective inflection regions rather than full annotation. Grounded in the peak-end rule and ordinal representations of emotion, PREFAB employs a preference-learning model to detect relative affective changes, directing annotators to label only selected segments while interpolating the remainder of the stimulus. We further introduce a preview mechanism that provides brief contextual cues to assist annotation. We evaluate PREFAB through a technical performance study and a 25-participant user study. Results show that PREFAB outperforms baselines in modeling affective inflections while mitigating workload (and conditionally mitigating temporal burden). Importantly PREFAB improves annotator confidence without degrading annotation quality.

CVFeb 2, 2024Code
BehAVE: Behaviour Alignment of Video Game Encodings

Nemanja Rašajski, Chintan Trivedi, Konstantinos Makantasis et al.

Domain randomisation enhances the transferability of vision models across visually distinct domains with similar content. However, current methods heavily depend on intricate simulation engines, hampering feasibility and scalability. This paper introduces BehAVE, a video understanding framework that utilises existing commercial video games for domain randomisation without accessing their simulation engines. BehAVE taps into the visual diversity of video games for randomisation and uses textual descriptions of player actions to align videos with similar content. We evaluate BehAVE across 25 first-person shooter (FPS) games using various video and text foundation models, demonstrating its robustness in domain randomisation. BehAVE effectively aligns player behavioural patterns and achieves zero-shot transfer to multiple unseen FPS games when trained on just one game. In a more challenging scenario, BehAVE enhances the zero-shot transferability of foundation models to unseen FPS games, even when trained on a game of a different genre, with improvements of up to 22%. BehAVE is available online at https://github.com/nrasajski/BehAVE.

17.3AIMay 13
Learning Local Constraints for Reinforcement-Learned Content Generators

Debosmita Bhaumik, Julian Togelius, Georgios N. Yannakakis et al.

Constraint-based game content generators that learn local constraints from existing content, such as Wave Function Collapse (WFC), can generate visually satisfying game levels but face challenges in guaranteeing global properties, such as playability. On the other hand, reinforcement-learning trained generators can guarantee global properties -- because such properties can easily be included in reward functions -- but the results can be visually dissatisfying. In this paper, we explore ways to combine these methods. Specifically, we constrain the action space of a PCGRL generator with constraints learned by WFC, effectively allowing the PCGRL generator to achieve global properties while forced to adhere to local constraints. To better analyze how this hybrid content generation method operates, we vary the number and type of inputs, and we test whether to randomly collapse the starting state and exclude rare patterns. While the method is sensitive to hyperparameter tuning, the best of our trained generators produce visually satisfying and playable puzzle-platform game levels -- such as Lode Runner levels -- with desired global properties.

CLFeb 28, 2024
Large Language Models and Games: A Survey and Roadmap

Roberto Gallotta, Graham Todd, Marvin Zammit et al.

Recent years have seen an explosive increase in research on large language models (LLMs), and accompanying public engagement on the topic. While starting as a niche area within natural language processing, LLMs have shown remarkable potential across a broad range of applications and domains, including games. This paper surveys the current state of the art across the various applications of LLMs in and for games, and identifies the different roles LLMs can take within a game. Importantly, we discuss underexplored areas and promising directions for future uses of LLMs in games and we reconcile the potential and limitations of LLMs within the games domain. As the first comprehensive survey and roadmap at the intersection of LLMs and games, we are hopeful that this paper will serve as the basis for groundbreaking research and innovation in this exciting new field.

MLJun 4, 2015Code
The Preference Learning Toolbox

Vincent E. Farrugia, Héctor P. Martínez, Georgios N. Yannakakis

Preference learning (PL) is a core area of machine learning that handles datasets with ordinal relations. As the number of generated data of ordinal nature is increasing, the importance and role of the PL field becomes central within machine learning research and practice. This paper introduces an open source, scalable, efficient and accessible preference learning toolbox that supports the key phases of the data training process incorporating various popular data preprocessing, feature selection and preference learning methods.

SEJan 21, 2025
FREYR: A Framework for Recognizing and Executing Your Requests

Roberto Gallotta, Antonios Liapis, Georgios N. Yannakakis

Large language models excel as conversational agents, but their capabilities can be further extended through tool usage, i.e.: executable code, to enhance response accuracy or address specialized domains. Current approaches to enable tool usage often rely on model-specific prompting or fine-tuning a model for function-calling instructions. Both approaches have notable limitations, including reduced adaptability to unseen tools and high resource requirements. This paper introduces FREYR, a streamlined framework that modularizes the tool usage process into separate steps. Through this decomposition, we show that FREYR achieves superior performance compared to conventional tool usage methods. We evaluate FREYR on a set of real-world test cases specific for video game design and compare it against traditional tool usage as provided by the Ollama API.

AIMar 27, 2025
The Procedural Content Generation Benchmark: An Open-source Testbed for Generative Challenges in Games

Ahmed Khalifa, Roberto Gallotta, Matthew Barthet et al.

This paper introduces the Procedural Content Generation Benchmark for evaluating generative algorithms on different game content creation tasks. The benchmark comes with 12 game-related problems with multiple variants on each problem. Problems vary from creating levels of different kinds to creating rule sets for simple arcade games. Each problem has its own content representation, control parameters, and evaluation metrics for quality, diversity, and controllability. This benchmark is intended as a first step towards a standardized way of comparing generative algorithms. We use the benchmark to score three baseline algorithms: a random generator, an evolution strategy, and a genetic algorithm. Results show that some problems are easier to solve than others, as well as the impact the chosen objective has on quality, diversity, and controllability of the generated artifacts.

CVFeb 5, 2025
Can Large Language Models Capture Video Game Engagement?

David Melhart, Matthew Barthet, Georgios N. Yannakakis

Can out-of-the-box pretrained Large Language Models (LLMs) detect human affect successfully when observing a video? To address this question, for the first time, we evaluate comprehensively the capacity of popular LLMs to annotate and successfully predict continuous affect annotations of videos when prompted by a sequence of text and video frames in a multimodal fashion. Particularly in this paper, we test LLMs' ability to correctly label changes of in-game engagement in 80 minutes of annotated videogame footage from 20 first-person shooter games of the GameVibe corpus. We run over 2,400 experiments to investigate the impact of LLM architecture, model size, input modality, prompting strategy, and ground truth processing method on engagement prediction. Our findings suggest that while LLMs rightfully claim human-like performance across multiple domains, they generally fall behind capturing continuous experience annotations provided by humans. We examine some of the underlying causes for the relatively poor overall performance, highlight the cases where LLMs exceed expectations, and draw a roadmap for the further exploration of automated emotion labelling via LLMs.

HCDec 30, 2024
Human-like Bots for Tactical Shooters Using Compute-Efficient Sensors

Niels Justesen, Maria Kaselimi, Sam Snodgrass et al.

Artificial intelligence (AI) has enabled agents to master complex video games, from first-person shooters like Counter-Strike to real-time strategy games such as StarCraft II and racing games like Gran Turismo. While these achievements are notable, applying these AI methods in commercial video game production remains challenging due to computational constraints. In commercial scenarios, the majority of computational resources are allocated to 3D rendering, leaving limited capacity for AI methods, which often demand high computational power, particularly those relying on pixel-based sensors. Moreover, the gaming industry prioritizes creating human-like behavior in AI agents to enhance player experience, unlike academic models that focus on maximizing game performance. This paper introduces a novel methodology for training neural networks via imitation learning to play a complex, commercial-standard, VALORANT-like 2v2 tactical shooter game, requiring only modest CPU hardware during inference. Our approach leverages an innovative, pixel-free perception architecture using a small set of ray-cast sensors, which capture essential spatial information efficiently. These sensors allow AI to perform competently without the computational overhead of traditional methods. Models are trained to mimic human behavior using supervised learning on human trajectory data, resulting in realistic and engaging AI agents. Human evaluation tests confirm that our AI agents provide human-like gameplay experiences while operating efficiently under computational constraints. This offers a significant advancement in AI model development for tactical shooter games and possibly other genres.

NEApr 7, 2024
Dynamic Quality-Diversity Search

Roberto Gallotta, Antonios Liapis, Georgios N. Yannakakis

Evolutionary search via the quality-diversity (QD) paradigm can discover highly performing solutions in different behavioural niches, showing considerable potential in complex real-world scenarios such as evolutionary robotics. Yet most QD methods only tackle static tasks that are fixed over time, which is rarely the case in the real world. Unlike noisy environments, where the fitness of an individual changes slightly at every evaluation, dynamic environments simulate tasks where external factors at unknown and irregular intervals alter the performance of the individual with a severity that is unknown a priori. Literature on optimisation in dynamic environments is extensive, yet such environments have not been explored in the context of QD search. This paper introduces a novel and generalisable Dynamic QD methodology that aims to keep the archive of past solutions updated in the case of environment changes. Secondly, we present a novel characterisation of dynamic environments that can be easily applied to well-known benchmarks, with minor interventions to move them from a static task to a dynamic one. Our Dynamic QD intervention is applied on MAP-Elites and CMA-ME, two powerful QD algorithms, and we test the dynamic variants on different dynamic tasks.

LGAug 26, 2025
Emotions as Ambiguity-aware Ordinal Representations

Jingyao Wu, Matthew Barthet, David Melhart et al.

Emotions are inherently ambiguous and dynamic phenomena, yet existing continuous emotion recognition approaches either ignore their ambiguity or treat ambiguity as an independent and static variable over time. Motivated by this gap in the literature, in this paper we introduce ambiguity-aware ordinal emotion representations, a novel framework that captures both the ambiguity present in emotion annotation and the inherent temporal dynamics of emotional traces. Specifically, we propose approaches that model emotion ambiguity through its rate of change. We evaluate our framework on two affective corpora -- RECOLA and GameVibe -- testing our proposed approaches on both bounded (arousal, valence) and unbounded (engagement) continuous traces. Our results demonstrate that ordinal representations outperform conventional ambiguity-aware models on unbounded labels, achieving the highest Concordance Correlation Coefficient (CCC) and Signed Differential Agreement (SDA) scores, highlighting their effectiveness in modeling the traces' dynamics. For bounded traces, ordinal representations excel in SDA, revealing their superior ability to capture relative changes of annotated emotion traces.

CLAug 22, 2025
Ethical Considerations of Large Language Models in Game Playing

Qingquan Zhang, Yuchen Li, Bo Yuan et al.

Large language models (LLMs) have demonstrated tremendous potential in game playing, while little attention has been paid to their ethical implications in those contexts. This work investigates and analyses the ethical considerations of applying LLMs in game playing, using Werewolf, also known as Mafia, as a case study. Gender bias, which affects game fairness and player experience, has been observed from the behaviour of LLMs. Some roles, such as the Guard and Werewolf, are more sensitive than others to gender information, presented as a higher degree of behavioural change. We further examine scenarios in which gender information is implicitly conveyed through names, revealing that LLMs still exhibit discriminatory tendencies even in the absence of explicit gender labels. This research showcases the importance of developing fair and ethical LLMs. Beyond our research findings, we discuss the challenges and opportunities that lie ahead in this field, emphasising the need for diving deeper into the ethical implications of LLMs in gaming and other interactive domains.

LGJul 30, 2025
Privileged Contrastive Pretraining for Multimodal Affect Modelling

Kosmas Pinitas, Konstantinos Makantasis, Georgios N. Yannakakis

Affective Computing (AC) has made significant progress with the advent of deep learning, yet a persistent challenge remains: the reliable transfer of affective models from controlled laboratory settings (in-vitro) to uncontrolled real-world environments (in-vivo). To address this challenge we introduce the Privileged Contrastive Pretraining (PriCon) framework according to which models are first pretrained via supervised contrastive learning (SCL) and then act as teacher models within a Learning Using Privileged Information (LUPI) framework. PriCon both leverages privileged information during training and enhances the robustness of derived affect models via SCL. Experiments conducted on two benchmark affective corpora, RECOLA and AGAIN, demonstrate that models trained using PriCon consistently outperform LUPI and end to end models. Remarkably, in many cases, PriCon models achieve performance comparable to models trained with access to all modalities during both training and testing. The findings underscore the potential of PriCon as a paradigm towards further bridging the gap between in-vitro and in-vivo affective modelling, offering a scalable and practical solution for real-world applications.

AIJun 24, 2025
Evolutionary Level Repair

Debosmita Bhaumik, Julian Togelius, Georgios N. Yannakakis et al.

We address the problem of game level repair, which consists of taking a designed but non-functional game level and making it functional. This might consist of ensuring the completeness of the level, reachability of objects, or other performance characteristics. The repair problem may also be constrained in that it can only make a small number of changes to the level. We investigate search-based solutions to the level repair problem, particularly using evolutionary and quality-diversity algorithms, with good results. This level repair method is applied to levels generated using a machine learning-based procedural content generation (PCGML) method that generates stylistically appropriate but frequently broken levels. This combination of PCGML for generation and search-based methods for repair shows great promise as a hybrid procedural content generation (PCG) method.

HCJun 17, 2024
GameVibe: A Multimodal Affective Game Corpus

Matthew Barthet, Maria Kaselimi, Kosmas Pinitas et al.

As online video and streaming platforms continue to grow, affective computing research has undergone a shift towards more complex studies involving multiple modalities. However, there is still a lack of readily available datasets with high-quality audiovisual stimuli. In this paper, we present GameVibe, a novel affect corpus which consists of multimodal audiovisual stimuli, including in-game behavioural observations and third-person affect traces for viewer engagement. The corpus consists of videos from a diverse set of publicly available gameplay sessions across 30 games, with particular attention to ensure high-quality stimuli with good audiovisual and gameplay diversity. Furthermore, we present an analysis on the reliability of the annotators in terms of inter-annotator agreement.

HCMay 18, 2023
From the Lab to the Wild: Affect Modeling via Privileged Information

Konstantinos Makantasis, Kosmas Pinitas, Antonios Liapis et al.

How can we reliably transfer affect models trained in controlled laboratory conditions (in-vitro) to uncontrolled real-world settings (in-vivo)? The information gap between in-vitro and in-vivo applications defines a core challenge of affective computing. This gap is caused by limitations related to affect sensing including intrusiveness, hardware malfunctions and availability of sensors. As a response to these limitations, we introduce the concept of privileged information for operating affect models in real-world scenarios (in the wild). Privileged information enables affect models to be trained across multiple modalities available in a lab, and ignore, without significant performance drops, those modalities that are not available when they operate in the wild. Our approach is tested in two multimodal affect databases one of which is designed for testing models of affect in the wild. By training our affect models using all modalities and then using solely raw footage frames for testing the models, we reach the performance of models that fuse all available modalities for both training and testing. The results are robust across both classification and regression affect modeling tasks which are dominant paradigms in affective computing. Our findings make a decisive step towards realizing affect interaction in the wild.

HCMay 12, 2023
The Ethics of AI in Games

David Melhart, Julian Togelius, Benedikte Mikkelsen et al.

Video games are one of the richest and most popular forms of human-computer interaction and, hence, their role is critical for our understanding of human behaviour and affect at a large scale. As artificial intelligence (AI) tools are gradually adopted by the game industry a series of ethical concerns arise. Such concerns, however, have so far not been extensively discussed in a video game context. Motivated by the lack of a comprehensive review of the ethics of AI as applied to games, we survey the current state of the art in this area and discuss ethical considerations of these systems from the holistic perspective of the affective loop. Through the components of this loop, we study the ethical challenges that AI faces in video game development. Elicitation highlights the ethical boundaries of artificially induced emotions; sensing showcases the trade-off between privacy and safe gaming spaces; and detection, as utilised during in-game adaptation, poses challenges to transparency and ownership. This paper calls for an open dialogue and action for the games of today and the virtual spaces of the future. By setting an appropriate framework we aim to protect users and to guide developers towards safer and better experiences for their customers.

HCDec 11, 2021
Architectural Form and Affect: A Spatiotemporal Study of Arousal

Emmanouil Xylakis, Antonios Liapis, Georgios N. Yannakakis

How does the form of our surroundings impact the ways we feel? This paper extends the body of research on the effects that space and light have on emotion by focusing on critical features of architectural form and illumination colors and their spatiotemporal impact on arousal. For that purpose, we solicited a corpus of spatial transitions in video form, lasting over 60 minutes, annotated by three participants in terms of arousal in a time-continuous and unbounded fashion. We process the annotation traces of that corpus in a relative fashion, focusing on the direction of arousal changes (increasing or decreasing) as affected by changes between consecutive rooms. Results show that properties of the form such as curved or complex spaces align highly with increased arousal. The analysis presented in this paper sheds some initial light in the relationship between arousal and core spatiotemporal features of form that is of particular importance for the affect-driven design of architectural spaces.

HCOct 3, 2021
Towards General Models of Player Experience: A Study Within Genres

David Melhart, Antonios Liapis, Georgios N. Yannakakis

To which degree can abstract gameplay metrics capture the player experience in a general fashion within a game genre? In this comprehensive study we address this question across three different videogame genres: racing, shooter, and platformer games. Using high-level gameplay features that feed preference learning models we are able to predict arousal accurately across different games of the same genre in a large-scale dataset of over 1,000 arousal-annotated play sessions. Our genre models predict changes in arousal with up to 74% accuracy on average across all genres and 86% in the best cases. We also examine the feature importance during the modelling process and find that time-related features largely contribute to the performance of both game and genre models. The prominence of these game-agnostic features show the importance of the temporal dynamics of the play experience in modelling, but also highlight some of the challenges for the future of general affect modelling in games and beyond.

CVSep 30, 2021
AffectGAN: Affect-Based Generative Art Driven by Semantics

Theodoros Galanos, Antonios Liapis, Georgios N. Yannakakis

This paper introduces a novel method for generating artistic images that express particular affective states. Leveraging state-of-the-art deep learning methods for visual generation (through generative adversarial networks), semantic models from OpenAI, and the annotated dataset of the visual art encyclopedia WikiArt, our AffectGAN model is able to generate images based on specific or broad semantic prompts and intended affective outcomes. A small dataset of 32 images generated by AffectGAN is annotated by 50 participants in terms of the particular emotion they elicit, as well as their quality and novelty. Results show that for most instances the intended emotion used as a prompt for image generation matches the participants' responses. This small-scale study brings forth a new vision towards blending affective computing with computational creativity, enabling generative systems with intentionality in terms of the emotions they wish their output to elicit.

LGSep 24, 2021
Go-Blend behavior and affect

Matthew Barthet, Antonios Liapis, Georgios N. Yannakakis

This paper proposes a paradigm shift for affective computing by viewing the affect modeling task as a reinforcement learning process. According to our proposed framework the context (environment) and the actions of an agent define the common representation that interweaves behavior and affect. To realise this framework we build on recent advances in reinforcement learning and use a modified version of the Go-Explore algorithm which has showcased supreme performance in hard exploration tasks. In this initial study, we test our framework in an arcade game by training Go-Explore agents to both play optimally and attempt to mimic human demonstrations of arousal. We vary the degree of importance between optimal play and arousal imitation and create agents that can effectively display a palette of affect and behavioral patterns. Our Go-Explore implementation not only introduces a new paradigm for affect modeling; it empowers believable AI-based game testing by providing agents that can blend and express a multitude of behavioral and affective patterns.

HCJul 22, 2021
Privileged Information for Modeling Affect In The Wild

Konstantinos Makantasis, David Melhart, Antonios Liapis et al.

A key challenge of affective computing research is discovering ways to reliably transfer affect models that are built in the laboratory to real world settings, namely in the wild. The existing gap between in vitro and in vivo affect applications is mainly caused by limitations related to affect sensing including intrusiveness, hardware malfunctions, availability of sensors, but also privacy and security. As a response to these limitations in this paper we are inspired by recent advances in machine learning and introduce the concept of privileged information for operating affect models in the wild. The presence of privileged information enables affect models to be trained across multiple modalities available in a lab setting and ignore modalities that are not available in the wild with no significant drop in their modeling performance. The proposed privileged information framework is tested in a game arousal corpus that contains physiological signals in the form of heart rate and electrodermal activity, game telemetry, and pixels of footage from two dissimilar games that are annotated with arousal traces. By training our arousal models using all modalities (in vitro) and using solely pixels for testing the models (in vivo), we reach levels of accuracy obtained from models that fuse all modalities both for training and testing. The findings of this paper make a decisive step towards realizing affect interaction in the wild.

LGJul 7, 2021
Keiki: Towards Realistic Danmaku Generation via Sequential GANs

Ziqi Wang, Jialin Liu, Georgios N. Yannakakis

Search-based procedural content generation methods have recently been introduced for the autonomous creation of bullet hell games. Search-based methods, however, can hardly model patterns of danmakus -- the bullet hell shooting entity -- explicitly and the resulting levels often look non-realistic. In this paper, we present a novel bullet hell game platform named Keiki, which allows the representation of danmakus as a parametric sequence which, in turn, can model the sequential behaviours of danmakus. We employ three types of generative adversarial networks (GANs) and test Keiki across three metrics designed to quantify the quality of the generated danmakus. The time-series GAN and periodic spatial GAN show different yet competitive performance in terms of the evaluation metrics adopted, their deviation from human-designed danmakus, and the diversity of generated danmakus. The preliminary experimental studies presented here showcase that potential of time-series GANs for sequential content generation in games.

AIJun 30, 2021
Experience-Driven PCG via Reinforcement Learning: A Super Mario Bros Study

Tianye Shu, Jialin Liu, Georgios N. Yannakakis

We introduce a procedural content generation (PCG) framework at the intersections of experience-driven PCG and PCG via reinforcement learning, named ED(PCG)RL, EDRL in short. EDRL is able to teach RL designers to generate endless playable levels in an online manner while respecting particular experiences for the player as designed in the form of reward functions. The framework is tested initially in the Super Mario Bros game. In particular, the RL designers of Super Mario Bros generate and concatenate level segments while considering the diversity among the segments. The correctness of the generation is ensured by a neural net-assisted evolutionary level repairer and the playability of the whole level is determined through AI-based testing. Our agents in this EDRL implementation learn to maximise a quantification of Koster's principle of fun by moderating the degree of diversity across level segments. Moreover, we test their ability to design fun levels that are diverse over time and playable. Our proposed framework is capable of generating endless, playable Super Mario Bros levels with varying degrees of fun, deviation from earlier segments, and playability. EDRL can be generalised to any game that is built as a segment-based sequential process and features a built-in compressed representation of its game content.

CVJun 18, 2021
Contrastive Learning of Generalized Game Representations

Chintan Trivedi, Antonios Liapis, Georgios N. Yannakakis

Representing games through their pixels offers a promising approach for building general-purpose and versatile game models. While games are not merely images, neural network models trained on game pixels often capture differences of the visual style of the image rather than the content of the game. As a result, such models cannot generalize well even within similar games of the same genre. In this paper we build on recent advances in contrastive learning and showcase its benefits for representation learning in games. Learning to contrast images of games not only classifies games in a more efficient manner; it also yields models that separate games in a more meaningful fashion by ignoring the visual style and focusing, instead, on their content. Our results in a large dataset of sports video games containing 100k images across 175 games and 10 game genres suggest that contrastive learning is better suited for learning generalized game representations compared to conventional supervised learning. The findings of this study bring us closer to universal visual encoders for games that can be reused across previously unseen games without requiring retraining or fine-tuning.

NEApr 18, 2021
Monte Carlo Elites: Quality-Diversity Selection as a Multi-Armed Bandit Problem

Konstantinos Sfikas, Antonios Liapis, Georgios N. Yannakakis

A core challenge of evolutionary search is the need to balance between exploration of the search space and exploitation of highly fit regions. Quality-diversity search has explicitly walked this tightrope between a population's diversity and its quality. This paper extends a popular quality-diversity search algorithm, MAP-Elites, by treating the selection of parents as a multi-armed bandit problem. Using variations of the upper-confidence bound to select parents from under-explored but potentially rewarding areas of the search space can accelerate the discovery of new regions as well as improve its archive's total quality. The paper tests an indirect measure of quality for parent selection: the survival rate of a parent's offspring. Results show that maintaining a balance between exploration and exploitation leads to the most diverse and high-quality set of solutions in three different testbeds.

NEApr 18, 2021
ARCH-Elites: Quality-Diversity for Urban Design

Theodoros Galanos, Antonios Liapis, Georgios N. Yannakakis et al.

This paper introduces ARCH-Elites, a MAP-Elites implementation that can reconfigure large-scale urban layouts at real-world locations via a pre-trained surrogate model instead of costly simulations. In a series of experiments, we generate novel urban designs for two real-world locations in Boston, Massachusetts. Combining the exploration of a possibility space with real-time performance evaluation creates a powerful new paradigm for architectural generative design that can extract and articulate design intelligence.

HCApr 6, 2021
The Arousal video Game AnnotatIoN (AGAIN) Dataset

David Melhart, Antonios Liapis, Georgios N. Yannakakis

How can we model affect in a general fashion, across dissimilar tasks, and to which degree are such general representations of affect even possible? To address such questions and enable research towards general affective computing, this paper introduces The Arousal video Game AnnotatIoN (AGAIN) dataset. AGAIN is a large-scale affective corpus that features over 1,100 in-game videos (with corresponding gameplay data) from nine different games, which are annotated for arousal from 124 participants in a first-person continuous fashion. Even though AGAIN is created for the purpose of investigating the generality of affective computing across dissimilar tasks, affect modelling can be studied within each of its 9 specific interactive games. To the best of our knowledge AGAIN is the largest -- over 37 hours of annotated video and game logs -- and most diverse publicly available affective dataset based on games as interactive affect elicitors.

LGMar 29, 2021
Pairing Character Classes in a Deathmatch Shooter Game via a Deep-Learning Surrogate Model

Daniel Karavolos, Antonios Liapis, Georgios N. Yannakakis

This paper introduces a surrogate model of gameplay that learns the mapping between different game facets, and applies it to a generative system which designs new content in one of these facets. Focusing on the shooter game genre, the paper explores how deep learning can help build a model which combines the game level structure and the game's character class parameters as input and the gameplay outcomes as output. The model is trained on a large corpus of game data from simulations with artificial agents in random sets of levels and class parameters. The model is then used to generate classes for specific levels and for a desired game outcome, such as balanced matches of short duration. Findings in this paper show that the system can be expressive and can generate classes for both computer generated and human authored levels.

AIMar 22, 2021
Transforming Exploratory Creativity with DeLeNoX

Antonios Liapis, Hector P. Martinez, Julian Togelius et al.

We introduce DeLeNoX (Deep Learning Novelty Explorer), a system that autonomously creates artifacts in constrained spaces according to its own evolving interestingness criterion. DeLeNoX proceeds in alternating phases of exploration and transformation. In the exploration phases, a version of novelty search augmented with constraint handling searches for maximally diverse artifacts using a given distance function. In the transformation phases, a deep learning autoencoder learns to compress the variation between the found artifacts into a lower-dimensional space. The newly trained encoder is then used as the basis for a new distance function, transforming the criteria for the next exploration phase. In the current paper, we apply DeLeNoX to the creation of spaceships suitable for use in two-dimensional arcade-style computer games, a representative problem in procedural content generation in games. We also situate DeLeNoX in relation to the distinction between exploratory and transformational creativity, and in relation to Schmidhuber's theory of creativity through the drive for compression progress.

HCJan 26, 2021
The Pixels and Sounds of Emotion: General-Purpose Representations of Arousal in Games

Konstantinos Makantasis, Antonios Liapis, Georgios N. Yannakakis

What if emotion could be captured in a general and subject-agnostic fashion? Is it possible, for instance, to design general-purpose representations that detect affect solely from the pixels and audio of a human-computer interaction video? In this paper we address the above questions by evaluating the capacity of deep learned representations to predict affect by relying only on audiovisual information of videos. We assume that the pixels and audio of an interactive session embed the necessary information required to detect affect. We test our hypothesis in the domain of digital games and evaluate the degree to which deep classifiers and deep preference learning algorithms can learn to predict the arousal of players based only on the video footage of their gameplay. Our results from four dissimilar games suggest that general-purpose representations can be built across games as the arousal models obtain average accuracies as high as 85% using the challenging leave-one-video-out cross-validation scheme. The dissimilar audiovisual characteristics of the tested games showcase the strengths and limitations of the proposed method.

AIOct 9, 2020
Deep Learning for Procedural Content Generation

Jialin Liu, Sam Snodgrass, Ahmed Khalifa et al.

Procedural content generation in video games has a long history. Existing procedural content generation methods, such as search-based, solver-based, rule-based and grammar-based methods have been applied to various content types such as levels, maps, character models, and textures. A research field centered on content generation in games has existed for more than a decade. More recently, deep learning has powered a remarkable range of inventions in content production, which are applicable to games. While some cutting-edge deep learning methods are applied on their own, others are applied in combination with more traditional methods, or in an interactive setting. This article surveys the various deep learning methods that have been applied to generate game content directly or indirectly, discusses deep learning methods that could be used for content generation purposes but are rarely used today, and envisages some limitations and potential future directions of deep learning for procedural content generation.

HCAug 17, 2020
Moment-to-moment Engagement Prediction through the Eyes of the Observer: PUBG Streaming on Twitch

David Melhart, Daniele Gravina, Georgios N. Yannakakis

Is it possible to predict moment-to-moment gameplay engagement based solely on game telemetry? Can we reveal engaging moments of gameplay by observing the way the viewers of the game behave? To address these questions in this paper, we reframe the way gameplay engagement is defined and we view it, instead, through the eyes of a game's live audience. We build prediction models for viewers' engagement based on data collected from the popular battle royale game PlayerUnknown's Battlegrounds as obtained from the Twitch streaming service. In particular, we collect viewers' chat logs and in-game telemetry data from several hundred matches of five popular streamers (containing over 100,000 game events) and machine learn the mapping between gameplay and viewer chat frequency during play, using small neural network architectures. Our key findings showcase that engagement models trained solely on 40 gameplay features can reach accuracies of up to 80% on average and 84% at best. Our models are scalable and generalisable as they perform equally well within- and across-streamers, as well as across streamer play styles.