CLMar 17, 2023
CoLT5: Faster Long-Range Transformers with Conditional ComputationJoshua Ainslie, Tao Lei, Michiel de Jong et al. · deepmind
Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.
CLAug 28, 2023
MEMORY-VQ: Compression for Tractable Internet-Scale MemoryYury Zemlyanskiy, Michiel de Jong, Luke Vilnis et al.
Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce storage requirements of memory-augmented models without sacrificing performance. Our method uses a vector quantization variational autoencoder (VQ-VAE) to compress token representations. We apply MEMORY-VQ to the LUMEN model to obtain LUMEN-VQ, a memory model that achieves a 16x compression rate with comparable performance on the KILT benchmark. LUMEN-VQ enables practical retrieval augmentation even for extremely large retrieval corpora.
LGSep 29, 2023Code
Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning PlatformShengyi Huang, Jiayi Weng, Rujikorn Charakorn et al.
Distributed Deep Reinforcement Learning (DRL) aims to leverage more computational resources to train autonomous agents with less training time. Despite recent progress in the field, reproducibility issues have not been sufficiently explored. This paper first shows that the typical actor-learner framework can have reproducibility issues even if hyperparameters are controlled. We then introduce Cleanba, a new open-source platform for distributed DRL that proposes a highly reproducible architecture. Cleanba implements highly optimized distributed variants of PPO and IMPALA. Our Atari experiments show that these variants can obtain equivalent or higher scores than strong IMPALA baselines in moolib and torchbeast and PPO baseline in CleanRL. However, Cleanba variants present 1) shorter training time and 2) more reproducible learning curves in different hardware settings. Cleanba's source code is available at \url{https://github.com/vwxyzjn/cleanba}
LGMay 18, 2022
A2C is a special case of PPOShengyi Huang, Anssi Kanervisto, Antonin Raffin et al.
Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using \texttt{Stable-baselines3}, showing A2C and PPO produce the \textit{exact} same models when other settings are controlled.
AIFeb 18, 2023
Improving Fairness in Adaptive Social Exergames via Shapley BanditsRobert C. Gray, Jennifer Villareale, Thomas B. Fox et al.
Algorithmic fairness is an essential requirement as AI becomes integrated in society. In the case of social applications where AI distributes resources, algorithms often must make decisions that will benefit a subset of users, sometimes repeatedly or exclusively, while attempting to maximize specific outcomes. How should we design such systems to serve users more fairly? This paper explores this question in the case where a group of users works toward a shared goal in a social exergame called Step Heroes. We identify adverse outcomes in traditional multi-armed bandits (MABs) and formalize the Greedy Bandit Problem. We then propose a solution based on a new type of fairness-aware multi-armed bandit, Shapley Bandits. It uses the Shapley Value for increasing overall player participation and intervention adherence rather than the maximization of total group output, which is traditionally achieved by favoring only high-performing participants. We evaluate our approach via a user study (n=46). Our results indicate that our Shapley Bandits effectively mediates the Greedy Bandit Problem and achieves better user retention and motivation across the participants.
LGMay 21, 2021Code
Gym-$μ$RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement LearningShengyi Huang, Santiago Ontañón, Chris Bamford et al.
In recent years, researchers have achieved great success in applying Deep Reinforcement Learning (DRL) algorithms to Real-time Strategy (RTS) games, creating strong autonomous agents that could defeat professional players in StarCraft~II. However, existing approaches to tackle full games have high computational costs, usually requiring the use of thousands of GPUs and CPUs for weeks. This paper has two main contributions to address this issue: 1) We introduce Gym-$μ$RTS (pronounced "gym-micro-RTS") as a fast-to-run RL environment for full-game RTS research and 2) we present a collection of techniques to scale DRL to play full-game $μ$RTS as well as ablation studies to demonstrate their empirical importance. Our best-trained bot can defeat every $μ$RTS bot we tested from the past $μ$RTS competitions when working in a single-map setting, resulting in a state-of-the-art DRL agent while only taking about 60 hours of training using a single machine (one GPU, three vCPU, 16GB RAM). See the blog post at https://wandb.ai/vwxyzjn/gym-microrts-paper/reports/Gym-RTS-Toward-Affordable-Deep-Reinforcement-Learning-Research-in-Real-Time-Strategy-Games--Vmlldzo2MDIzMTg and the source code at https://github.com/vwxyzjn/gym-microrts-paper
LGJun 25, 2020Code
A Closer Look at Invalid Action Masking in Policy Gradient AlgorithmsShengyi Huang, Santiago Ontañón
In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action distribution predicted by the learned policy is likely to be invalid according to the game rules (e.g., walking into a wall). The usual approach to deal with this problem in policy gradient algorithms is to "mask out" invalid actions and just sample from the set of valid actions. The implications of this process, however, remain under-investigated. In this paper, we 1) show theoretical justification for such a practice, 2) empirically demonstrate its importance as the space of invalid actions grows, and 3) provide further insights by evaluating different action masking regimes, such as removing masking after an agent has been trained using masking. The source code can be found at https://github.com/vwxyzjn/invalid-action-masking
AIApr 23, 2016Code
RHOG: A Refinement-Operator Library for Directed Labeled GraphsSantiago Ontañón
This document provides the foundations behind the functionality provided by the $ρ$G library (https://github.com/santiontanon/RHOG), focusing on the basic operations the library provides: subsumption, refinement of directed labeled graphs, and distance/similarity assessment between directed labeled graphs. $ρ$G development was initially supported by the National Science Foundation, by the EAGER grant IIS-1551338.
CLMay 18, 2023
mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer SequencesDavid Uthus, Santiago Ontañón, Joshua Ainslie et al.
We present our work on developing a multilingual, efficient text-to-text transformer that is suitable for handling long inputs. This model, called mLongT5, builds upon the architecture of LongT5, while leveraging the multilingual datasets used for pretraining mT5 and the pretraining tasks of UL2. We evaluate this model on a variety of multilingual summarization and question-answering tasks, and the results show stronger performance for mLongT5 when compared to existing multilingual models such as mBART or M-BERT.
CVNov 12, 2021
Identifying On-road Scenarios Predictive of ADHD usingDriving Simulator Time Series DataDavid Grethlein, Aleksanteri Sladek, Santiago Ontañón
In this paper we introduce a novel algorithm called Iterative Section Reduction (ISR) to automatically identify sub-intervals of spatiotemporal time series that are predictive of a target classification task. Specifically, using data collected from a driving simulator study, we identify which spatial regions (dubbed "sections") along the simulated routes tend to manifest driving behaviors that are predictive of the presence of Attention Deficit Hyperactivity Disorder (ADHD). Identifying these sections is important for two main reasons: (1) to improve predictive accuracy of the trained models by filtering out non-predictive time series sub-intervals, and (2) to gain insights into which on-road scenarios (dubbed events) elicit distinctly different driving behaviors from patients undergoing treatment for ADHD versus those that are not. Our experimental results show both improved performance over prior efforts (+10% accuracy) and good alignment between the predictive sections identified and scripted on-road events in the simulator (negotiating turns and curves).
LGOct 8, 2021
Iterative Decoding for Compositional Generalization in TransformersLuana Ruiz, Joshua Ainslie, Santiago Ontañón
Deep learning models generalize well to in-distribution data but struggle to generalize compositionally, i.e., to combine a set of learned primitives to solve more complex tasks. In sequence-to-sequence (seq2seq) learning, transformers are often unable to predict correct outputs for longer examples than those seen at training. This paper introduces iterative decoding, an alternative to seq2seq that (i) improves transformer compositional generalization in the PCFG and Cartesian product datasets and (ii) evidences that, in these datasets, seq2seq transformers do not learn iterations that are not unrolled. In iterative decoding, training examples are broken down into a sequence of intermediate steps that the transformer learns iteratively. At inference time, the intermediate outputs are fed back to the transformer as intermediate inputs until an end-of-iteration token is predicted. We conclude by illustrating some limitations of iterative decoding in the CFQ dataset.
AIAug 9, 2021
Making Transformers Solve Compositional TasksSantiago Ontañón, Joshua Ainslie, Vaclav Cvicek et al.
Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the inductive biases given to the model by several design decisions significantly impact compositional generalization. Through this exploration, we identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in a diverse set of compositional tasks, and that achieve state-of-the-art results in a semantic parsing compositional generalization benchmark (COGS), and a string edit operation composition benchmark (PCFG).
LGJun 19, 2021
Improving Compositional Generalization in Classification Tasks via Structure AnnotationsJuyong Kim, Pradeep Ravikumar, Joshua Ainslie et al.
Compositional generalization is the ability to generalize systematically to a new data distribution by combining known components. Although humans seem to have a great ability to generalize compositionally, state-of-the-art neural models struggle to do so. In this work, we study compositional generalization in classification tasks and present two main contributions. First, we study ways to convert a natural language sequence-to-sequence dataset to a classification dataset that also requires compositional generalization. Second, we show that providing structural hints (specifically, providing parse trees and entity links as attention masks for a Transformer model) helps compositional generalization.
SDMay 20, 2021
Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search QueriesSukhdeep S. Sodhi, Ellie Ka-In Chio, Ambarish Jash et al.
As more and more online search queries come from voice, automatic speech recognition becomes a key component to deliver relevant search results. Errors introduced by automatic speech recognition (ASR) lead to irrelevant search results returned to the user, thus causing user dissatisfaction. In this paper, we introduce an approach, Mondegreen, to correct voice queries in text space without depending on audio signals, which may not always be available due to system constraints or privacy or bandwidth (for example, some ASR systems run on-device) considerations. We focus on voice queries transcribed via several proprietary commercial ASR systems. These queries come from users making internet, or online service search queries. We first present an analysis showing how different the language distribution coming from user voice queries is from that in traditional text corpora used to train off-the-shelf ASR systems. We then demonstrate that Mondegreen can achieve significant improvements in increased user interaction by correcting user voice queries in one of the largest search systems in Google. Finally, we see Mondegreen as complementing existing highly-optimized production ASR systems, which may not be frequently retrained and thus lag behind due to vocabulary drifts.
HCMar 2, 2021
The Personalization Paradox: the Conflict between Accurate User Models and Personalized Adaptive SystemsSantiago Ontañón, Jichen Zhu
Personalized adaptation technology has been adopted in a wide range of digital applications such as health, training and education, e-commerce and entertainment. Personalization systems typically build a user model, aiming to characterize the user at hand, and then use this model to personalize the interaction. Personalization and user modeling, however, are often intrinsically at odds with each other (a fact some times referred to as the personalization paradox). In this paper, we take a closer look at this personalization paradox, and identify two ways in which it might manifest: feedback loops and moving targets. To illustrate these issues, we report results in the domain of personalized exergames (videogames for physical exercise), and describe our early steps to address some of the issues arisen by the personalization paradox.
AIFeb 15, 2021
Player-Centered AI for Automatic Game Personalization: Open ProblemsJichen Zhu, Santiago Ontañón
Computer games represent an ideal research domain for the next generation of personalized digital applications. This paper presents a player-centered framework of AI for game personalization, complementary to the commonly used system-centered approaches. Built on the Structure of Actions theory, the paper maps out the current landscape of game personalization research and identifies eight open problems that need further investigation. These problems require deep collaboration between technological advancement and player experience design.
AIFeb 10, 2021
Player Modeling via Multi-Armed BanditsRobert C. Gray, Jichen Zhu, Dannielle Arigo et al.
This paper focuses on building personalized player models solely from player behavior in the context of adaptive games. We present two main contributions: The first is a novel approach to player modeling based on multi-armed bandits (MABs). This approach addresses, at the same time and in a principled way, both the problem of collecting data to model the characteristics of interest for the current player and the problem of adapting the interactive experience based on this model. Second, we present an approach to evaluating and fine-tuning these algorithms prior to generating data in a user study. This is an important problem, because conducting user studies is an expensive and labor-intensive process; therefore, an ability to evaluate the algorithms beforehand can save a significant amount of resources. We evaluate our approach in the context of modeling players' social comparison orientation (SCO) and present empirical results from both simulations and real players.
LGFeb 10, 2021
Regression Oracles and Exploration Strategies for Short-Horizon Multi-Armed BanditsRobert C. Gray, Jichen Zhu, Santiago Ontañón
This paper explores multi-armed bandit (MAB) strategies in very short horizon scenarios, i.e., when the bandit strategy is only allowed very few interactions with the environment. This is an understudied setting in the MAB literature with many applications in the context of games, such as player modeling. Specifically, we pursue three different ideas. First, we explore the use of regression oracles, which replace the simple average used in strategies such as epsilon-greedy with linear regression models. Second, we examine different exploration patterns such as forced exploration phases. Finally, we introduce a new variant of the UCB1 strategy called UCBT that has interesting properties and no tunable parameters. We present experimental results in a domain motivated by exergames, where the goal is to maximize a player's daily steps. Our results show that the combination of epsilon-greedy or epsilon-decreasing with regression oracles outperforms all other tested strategies in the short horizon setting.
HCJan 25, 2021
Personalization Paradox in Behavior Change Apps: Lessons from a Social Comparison-Based Personalized App for Physical ActivityJichen Zhu, Diane H. Dallal, Robert C. Gray et al.
Social comparison-based features are widely used in social computing apps. However, most existing apps are not grounded in social comparison theories and do not consider individual differences in social comparison preferences and reactions. This paper is among the first to automatically personalize social comparison targets. In the context of an m-health app for physical activity, we use artificial intelligence (AI) techniques of multi-armed bandits. Results from our user study (n=53) indicate that there is some evidence that motivation can be increased using the AI-based personalization of social comparison. The detected effects achieved small-to-moderate effect sizes, illustrating the real-world implications of the intervention for enhancing motivation and physical activity. In addition to design implications for social comparison features in social apps, this paper identified the personalization paradox, the conflict between user modeling and adaptation, as a key design challenge of personalized applications for behavior change. Additionally, we propose research directions to mitigate this Personalization Paradox.
LGOct 5, 2020
Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy GamesShengyi Huang, Santiago Ontañón
Training agents using Reinforcement Learning in games with sparse rewards is a challenging problem, since large amounts of exploration are required to retrieve even the first reward. To tackle this problem, a common approach is to use reward shaping to help exploration. However, an important drawback of reward shaping is that agents sometimes learn to optimize the shaped reward instead of the true objective. In this paper, we present a novel technique that we call action guidance that successfully trains agents to eventually optimize the true objective in games with sparse rewards while maintaining most of the sample efficiency that comes with reward shaping. We evaluate our approach in a simplified real-time strategy (RTS) game simulator called $μ$RTS.
HCMay 10, 2020
Understanding Learners' Problem-Solving Strategies in Concurrent and Parallel Programming: A Game-Based ApproachJichen Zhu, Katelyn Alderfer, Brian Smith et al.
Concurrent and parallel programming (CPP) is an increasingly important subject in Computer Science Education. However, the conceptual shift from sequential programming is notoriously difficult to make. Currently, relatively little research exists on how people learn CPP core concepts. This paper presents our results of using Parallel, an educational game about CPP, focusing on the learners' self-efficacy and how they learn CPP concepts. Based on a study of 44 undergraduate students, our research shows that (a) self-efficacy increased significantly after playing the game; (b) the problem-solving strategies employed by students playing the game can be classified in three main types: trial and error, single-thread, and multi-threaded strategies, and (c) that self-efficacy is correlated with the percentage of time students spend in multithreaded problem-solving.
AIFeb 18, 2020
An Overview of Distance and Similarity Functions for Structured DataSantiago Ontañón
The notions of distance and similarity play a key role in many machine learning approaches, and artificial intelligence (AI) in general, since they can serve as an organizing principle by which individuals classify objects, form concepts and make generalizations. While distance functions for propositional representations have been thoroughly studied, work on distance functions for structured representations, such as graphs, frames or logical clauses, has been carried out in different communities and is much less understood. Specifically, a significant amount of work that requires the use of a distance or similarity function for structured representations of data usually employs ad-hoc functions for specific applications. Therefore, the goal of this paper is to provide an overview of this work to identify connections between the work carried out in different areas and point out directions for future work.
LGOct 26, 2019
Comparing Observation and Action Representations for Deep Reinforcement Learning in $μ$RTSShengyi Huang, Santiago Ontañón
This paper presents a preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games. Specifically, we compare two representations: (1) a global representation where the observation represents the whole game state, and the RL agent needs to choose which unit to issue actions to, and which actions to execute; and (2) a local representation where the observation is represented from the point of view of an individual unit, and the RL agent picks actions for each unit independently. We evaluate these representations in $μ$RTS showing that the local representation seems to outperform the global representation when training agents with the task of harvesting resources.
AIAug 15, 2019
Tracing Player Knowledge in a Parallel Programming Educational GamePavan Kantharaju, Katelyn Alderfer, Jichen Zhu et al.
This paper focuses on "tracing player knowledge" in educational games. Specifically, given a set of concepts or skills required to master a game, the goal is to estimate the likelihood with which the current player has mastery of each of those concepts or skills. The main contribution of the paper is an approach that integrates machine learning and domain knowledge rules to find when the player applied a certain skill and either succeeded or failed. This is then given as input to a standard knowledge tracing module (such as those from Intelligent Tutoring Systems) to perform knowledge tracing. We evaluate our approach in the context of an educational game called "Parallel" to teach parallel and concurrent programming with data collected from real users, showing our approach can predict students skills with a low mean-squared error.
HCJul 4, 2019
Experience Management in Multi-player GamesJichen Zhu, Santiago Ontañón
Experience Management studies AI systems that automatically adapt interactive experiences such as games to tailor to specific players and to fulfill design goals. Although it has been explored for several decades, existing work in experience management has mostly focused on single-player experiences. This paper is a first attempt at identifying the main challenges to expand EM to multi-player/multi-user games or experiences. We also make connections to related areas where solutions for similar problems have been proposed (especially group recommender systems) and discusses the potential impact and applications of multi-player EM.
AIOct 13, 2017
Combinatorial Multi-armed Bandits for Real-Time Strategy GamesSantiago Ontañón
Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called {\em naïve sampling}, based on a variant of the Multi-armed Bandit problem called {\em Combinatorial Multi-armed Bandits} (CMAB). We analyze the theoretical properties of several variants of {\em naïve sampling}, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, {\em naïve sampling} outperforms the other sampling strategies.
HCJun 23, 2016
The VGLC: The Video Game Level CorpusAdam James Summerville, Sam Snodgrass, Michael Mateas et al.
Levels are a key component of many different video games, and a large body of work has been produced on how to procedurally generate game levels. Recently, Machine Learning techniques have been applied to video game level generation towards the purpose of automatically generating levels that have the properties of the training corpus. Towards that end we have made available a corpora of video game levels in an easy to parse format ideal for different machine learning and other game AI research purposes.
AIMay 17, 2016
Combat Models for RTS GamesAlberto Uriarte, Santiago Ontañón
Game tree search algorithms, such as Monte Carlo Tree Search (MCTS), require access to a forward model (or "simulator") of the game at hand. However, in some games such forward model is not readily available. This paper presents three forward models for two-player attrition games, which we call "combat models", and show how they can be used to simulate combat in RTS games. We also show how these combat models can be learned from replay data. We use StarCraft as our application domain. We report experiments comparing our combat models predicting a combat output and their impact when used for tactical decisions during a real game.