Santiago Ontañón

h-index31

38papers

11,848citations

Novelty43%

AI Score35

Ranked #104,396 of 194,257 authors (top 54%)#22,976 in LG (top 57%)

38 Papers

25.6CLMar 17, 2023

CoLT5: Faster Long-Range Transformers with Conditional Computation

Joshua Ainslie, Tao Lei, Michiel de Jong et al. · deepmind

Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.

16.7AIMar 28, 2022

LogicInference: A New Dataset for Teaching Logical Inference to seq2seq Models

Santiago Ontanon, Joshua Ainslie, Vaclav Cvicek et al. · deepmind

Machine learning models such as Transformers or LSTMs struggle with tasks that are compositional in nature such as those involving reasoning/inference. Although many datasets exist to evaluate compositional generalization, when it comes to evaluating inference abilities, options are more limited. This paper presents LogicInference, a new dataset to evaluate the ability of models to perform logical inference. The dataset focuses on inference using propositional logic and a small subset of first-order logic, represented both in semi-formal logical notation, as well as in natural language. We also report initial results using a collection of machine learning models to establish an initial baseline in this dataset.

21.9CLSep 19, 2024

Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries

Kiran Vodrahalli, Santiago Ontanon, Nilesh Tripuraneni et al. · deepmind

We introduce Michelangelo: a minimal, synthetic, and unleaked long-context reasoning evaluation for large language models which is also easy to automatically score. This evaluation is derived via a novel, unifying framework for evaluations over arbitrarily long contexts which measure the model's ability to do more than retrieve a single piece of information from its context. The central idea of the Latent Structure Queries framework (LSQ) is to construct tasks which require a model to ``chisel away'' the irrelevant information in the context, revealing a latent structure in the context. To verify a model's understanding of this latent structure, we query the model for details of the structure. Using LSQ, we produce three diagnostic long-context evaluations across code and natural-language domains intended to provide a stronger signal of long-context language model capabilities. We perform evaluations on several state-of-the-art models and demonstrate both that a) the proposed evaluations are high-signal and b) that there is significant room for improvement in synthesizing long-context information.

9.6CLAug 28, 2023

MEMORY-VQ: Compression for Tractable Internet-Scale Memory

Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis et al.

Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce storage requirements of memory-augmented models without sacrificing performance. Our method uses a vector quantization variational autoencoder (VQ-VAE) to compress token representations. We apply MEMORY-VQ to the LUMEN model to obtain LUMEN-VQ, a memory model that achieves a 16x compression rate with comparable performance on the KILT benchmark. LUMEN-VQ enables practical retrieval augmentation even for extremely large retrieval corpora.

9.8LGSep 29, 2023Code

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Shengyi Huang, Jiayi Weng, Rujikorn Charakorn et al.

Distributed Deep Reinforcement Learning (DRL) aims to leverage more computational resources to train autonomous agents with less training time. Despite recent progress in the field, reproducibility issues have not been sufficiently explored. This paper first shows that the typical actor-learner framework can have reproducibility issues even if hyperparameters are controlled. We then introduce Cleanba, a new open-source platform for distributed DRL that proposes a highly reproducible architecture. Cleanba implements highly optimized distributed variants of PPO and IMPALA. Our Atari experiments show that these variants can obtain equivalent or higher scores than strong IMPALA baselines in moolib and torchbeast and PPO baseline in CleanRL. However, Cleanba variants present 1) shorter training time and 2) more reproducible learning curves in different hardware settings. Cleanba's source code is available at \url{https://github.com/vwxyzjn/cleanba}

27.3LGOct 6, 2023

Functional Interpolation for Relative Positions Improves Long Context Transformers

Shanda Li, Chong You, Guru Guruganesh et al.

Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models. Though the Transformer architecture has fundamentally no limits on the input sequence lengths it can process, the choice of position encoding used during training can limit the performance of these models on longer inputs. We propose a novel functional relative position encoding with progressive interpolation, FIRE, to improve Transformer generalization to longer contexts. We theoretically prove that this can represent some of the popular relative position encodings, such as T5's RPE, Alibi, and Kerple. We next empirically show that FIRE models have better generalization to longer contexts on both zero-shot language modeling and long text benchmarks.

14.1LGMay 18, 2022Code

A2C is a special case of PPO

Shengyi Huang, Anssi Kanervisto, Antonin Raffin et al.

Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are popular deep reinforcement learning algorithms used for game AI in recent years. A common understanding is that A2C and PPO are separate algorithms because PPO's clipped objective appears significantly different than A2C's objective. In this paper, however, we show A2C is a special case of PPO. We present theoretical justifications and pseudocode analysis to demonstrate why. To validate our claim, we conduct an empirical experiment using \texttt{Stable-baselines3}, showing A2C and PPO produce the \textit{exact} same models when other settings are controlled.

5.4AIFeb 18, 2023

Improving Fairness in Adaptive Social Exergames via Shapley Bandits

Robert C. Gray, Jennifer Villareale, Thomas B. Fox et al.

Algorithmic fairness is an essential requirement as AI becomes integrated in society. In the case of social applications where AI distributes resources, algorithms often must make decisions that will benefit a subset of users, sometimes repeatedly or exclusively, while attempting to maximize specific outcomes. How should we design such systems to serve users more fairly? This paper explores this question in the case where a group of users works toward a shared goal in a social exergame called Step Heroes. We identify adverse outcomes in traditional multi-armed bandits (MABs) and formalize the Greedy Bandit Problem. We then propose a solution based on a new type of fairness-aware multi-armed bandit, Shapley Bandits. It uses the Shapley Value for increasing overall player participation and intervention adherence rather than the maximization of total group output, which is traditionally achieved by favoring only high-performing participants. We evaluate our approach via a user study (n=46). Our results indicate that our Shapley Bandits effectively mediates the Greedy Bandit Problem and achieves better user retention and motivation across the participants.

62.2CLMar 8, 2024

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Gemini Team, Petko Georgiev, Ving Ian Lei et al. · deepmind, mila

In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.

12.5LGMay 21, 2021Code

Gym-$μ$RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning

Shengyi Huang, Santiago Ontañón, Chris Bamford et al.

In recent years, researchers have achieved great success in applying Deep Reinforcement Learning (DRL) algorithms to Real-time Strategy (RTS) games, creating strong autonomous agents that could defeat professional players in StarCraft~II. However, existing approaches to tackle full games have high computational costs, usually requiring the use of thousands of GPUs and CPUs for weeks. This paper has two main contributions to address this issue: 1) We introduce Gym-$μ$RTS (pronounced "gym-micro-RTS") as a fast-to-run RL environment for full-game RTS research and 2) we present a collection of techniques to scale DRL to play full-game $μ$RTS as well as ablation studies to demonstrate their empirical importance. Our best-trained bot can defeat every $μ$RTS bot we tested from the past $μ$RTS competitions when working in a single-map setting, resulting in a state-of-the-art DRL agent while only taking about 60 hours of training using a single machine (one GPU, three vCPU, 16GB RAM). See the blog post at https://wandb.ai/vwxyzjn/gym-microrts-paper/reports/Gym-RTS-Toward-Affordable-Deep-Reinforcement-Learning-Research-in-Real-Time-Strategy-Games--Vmlldzo2MDIzMTg and the source code at https://github.com/vwxyzjn/gym-microrts-paper

29.0LGJun 25, 2020Code

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Shengyi Huang, Santiago Ontañón

In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action distribution predicted by the learned policy is likely to be invalid according to the game rules (e.g., walking into a wall). The usual approach to deal with this problem in policy gradient algorithms is to "mask out" invalid actions and just sample from the set of valid actions. The implications of this process, however, remain under-investigated. In this paper, we 1) show theoretical justification for such a practice, 2) empirically demonstrate its importance as the space of invalid actions grows, and 3) provide further insights by evaluating different action masking regimes, such as removing masking after an agent has been trained using masking. The source code can be found at https://github.com/vwxyzjn/invalid-action-masking

2.5AIApr 23, 2016Code

RHOG: A Refinement-Operator Library for Directed Labeled Graphs

Santiago Ontañón

This document provides the foundations behind the functionality provided by the $ρ$G library (https://github.com/santiontanon/RHOG), focusing on the basic operations the library provides: subsumption, refinement of directed labeled graphs, and distance/similarity assessment between directed labeled graphs. $ρ$G development was initially supported by the National Science Foundation, by the EAGER grant IIS-1551338.

21.7CLMay 18, 2023Code

mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences

David Uthus, Santiago Ontañón, Joshua Ainslie et al.

We present our work on developing a multilingual, efficient text-to-text transformer that is suitable for handling long inputs. This model, called mLongT5, builds upon the architecture of LongT5, while leveraging the multilingual datasets used for pretraining mT5 and the pretraining tasks of UL2. We evaluate this model on a variety of multilingual summarization and question-answering tasks, and the results show stronger performance for mLongT5 when compared to existing multilingual models such as mBART or M-BERT.

1.7CLMay 8, 2023

Multi-Task End-to-End Training Improves Conversational Recommendation

Naveen Ram, Dima Kuzmin, Ellie Ka In Chio et al.

In this paper, we analyze the performance of a multitask end-to-end transformer model on the task of conversational recommendations, which aim to provide recommendations based on a user's explicit preferences expressed in dialogue. While previous works in this area adopt complex multi-component approaches where the dialogue management and entity recommendation tasks are handled by separate components, we show that a unified transformer model, based on the T5 text-to-text transformer model, can perform competitively in both recommending relevant items and generating conversation dialogue. We fine-tune our model on the ReDIAL conversational movie recommendation dataset, and create additional training tasks derived from MovieLens (such as the prediction of movie attributes and related movies based on an input movie), in a multitask learning setting. Using a series of probe studies, we demonstrate that the learned knowledge in the additional tasks is transferred to the conversational setting, where each task leads to a 9%-52% increase in its related probe score.

32.8CLDec 15, 2021Code

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Mandy Guo, Joshua Ainslie, David Uthus et al.

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call {\em Transient Global} (TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization tasks and outperform the original T5 models on question answering tasks.

1.4CVNov 12, 2021

Identifying On-road Scenarios Predictive of ADHD usingDriving Simulator Time Series Data

David Grethlein, Aleksanteri Sladek, Santiago Ontañón

In this paper we introduce a novel algorithm called Iterative Section Reduction (ISR) to automatically identify sub-intervals of spatiotemporal time series that are predictive of a target classification task. Specifically, using data collected from a driving simulator study, we identify which spatial regions (dubbed "sections") along the simulated routes tend to manifest driving behaviors that are predictive of the presence of Attention Deficit Hyperactivity Disorder (ADHD). Identifying these sections is important for two main reasons: (1) to improve predictive accuracy of the trained models by filtering out non-predictive time series sub-intervals, and (2) to gain insights into which on-road scenarios (dubbed events) elicit distinctly different driving behaviors from patients undergoing treatment for ADHD versus those that are not. Our experimental results show both improved performance over prior efforts (+10% accuracy) and good alignment between the predictive sections identified and scripted on-road events in the simulator (negotiating turns and curves).

7.5LGOct 8, 2021

Iterative Decoding for Compositional Generalization in Transformers

Luana Ruiz, Joshua Ainslie, Santiago Ontañón

Deep learning models generalize well to in-distribution data but struggle to generalize compositionally, i.e., to combine a set of learned primitives to solve more complex tasks. In sequence-to-sequence (seq2seq) learning, transformers are often unable to predict correct outputs for longer examples than those seen at training. This paper introduces iterative decoding, an alternative to seq2seq that (i) improves transformer compositional generalization in the PCFG and Cartesian product datasets and (ii) evidences that, in these datasets, seq2seq transformers do not learn iterations that are not unrolled. In iterative decoding, training examples are broken down into a sequence of intermediate steps that the transformer learns iteratively. At inference time, the intermediate outputs are fed back to the transformer as intermediate inputs until an end-of-iteration token is predicted. We conclude by illustrating some limitations of iterative decoding in the CFQ dataset.

57.3AIAug 9, 2021

Making Transformers Solve Compositional Tasks

Santiago Ontañón, Joshua Ainslie, Vaclav Cvicek et al.

Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the inductive biases given to the model by several design decisions significantly impact compositional generalization. Through this exploration, we identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in a diverse set of compositional tasks, and that achieve state-of-the-art results in a semantic parsing compositional generalization benchmark (COGS), and a string edit operation composition benchmark (PCFG).

52.9LGJun 19, 2021

Improving Compositional Generalization in Classification Tasks via Structure Annotations

Juyong Kim, Pradeep Ravikumar, Joshua Ainslie et al.

Compositional generalization is the ability to generalize systematically to a new data distribution by combining known components. Although humans seem to have a great ability to generalize compositionally, state-of-the-art neural models struggle to do so. In this work, we study compositional generalization in classification tasks and present two main contributions. First, we study ways to convert a natural language sequence-to-sequence dataset to a classification dataset that also requires compositional generalization. Second, we show that providing structural hints (specifically, providing parse trees and entity links as attention masks for a Transformer model) helps compositional generalization.

5.9SDMay 20, 2021

Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries

Sukhdeep S. Sodhi, Ellie Ka-In Chio, Ambarish Jash et al.

As more and more online search queries come from voice, automatic speech recognition becomes a key component to deliver relevant search results. Errors introduced by automatic speech recognition (ASR) lead to irrelevant search results returned to the user, thus causing user dissatisfaction. In this paper, we introduce an approach, Mondegreen, to correct voice queries in text space without depending on audio signals, which may not always be available due to system constraints or privacy or bandwidth (for example, some ASR systems run on-device) considerations. We focus on voice queries transcribed via several proprietary commercial ASR systems. These queries come from users making internet, or online service search queries. We first present an analysis showing how different the language distribution coming from user voice queries is from that in traditional text corpora used to train off-the-shelf ASR systems. We then demonstrate that Mondegreen can achieve significant improvements in increased user interaction by correcting user voice queries in one of the largest search systems in Google. Finally, we see Mondegreen as complementing existing highly-optimized production ASR systems, which may not be frequently retrained and thus lag behind due to vocabulary drifts.

34.2CLMay 9, 2021Code

FNet: Mixing Tokens with Fourier Transforms

James Lee-Thorp, Joshua Ainslie, Ilya Eckstein et al.

We show that Transformer encoder architectures can be sped up, with limited accuracy costs, by replacing the self-attention sublayers with simple linear transformations that "mix" input tokens. These linear mixers, along with standard nonlinearities in feed-forward layers, prove competent at modeling semantic relationships in several text classification tasks. Most surprisingly, we find that replacing the self-attention sublayer in a Transformer encoder with a standard, unparameterized Fourier Transform achieves 92-97% of the accuracy of BERT counterparts on the GLUE benchmark, but trains 80% faster on GPUs and 70% faster on TPUs at standard 512 input lengths. At longer input lengths, our FNet model is significantly faster: when compared to the "efficient" Transformers on the Long Range Arena benchmark, FNet matches the accuracy of the most accurate models, while outpacing the fastest models across all sequence lengths on GPUs (and across relatively shorter lengths on TPUs). Finally, FNet has a light memory footprint and is particularly efficient at smaller model sizes; for a fixed speed and accuracy budget, small FNet models outperform Transformer counterparts.

6.4HCMar 2, 2021

The Personalization Paradox: the Conflict between Accurate User Models and Personalized Adaptive Systems

Santiago Ontañón, Jichen Zhu

Personalized adaptation technology has been adopted in a wide range of digital applications such as health, training and education, e-commerce and entertainment. Personalization systems typically build a user model, aiming to characterize the user at hand, and then use this model to personalize the interaction. Personalization and user modeling, however, are often intrinsically at odds with each other (a fact some times referred to as the personalization paradox). In this paper, we take a closer look at this personalization paradox, and identify two ways in which it might manifest: feedback loops and moving targets. To illustrate these issues, we report results in the domain of personalized exergames (videogames for physical exercise), and describe our early steps to address some of the issues arisen by the personalization paradox.

10.1AIFeb 15, 2021

Player-Centered AI for Automatic Game Personalization: Open Problems

Jichen Zhu, Santiago Ontañón

Computer games represent an ideal research domain for the next generation of personalized digital applications. This paper presents a player-centered framework of AI for game personalization, complementary to the commonly used system-centered approaches. Built on the Structure of Actions theory, the paper maps out the current landscape of game personalization research and identifies eight open problems that need further investigation. These problems require deep collaboration between technological advancement and player experience design.

7.6AIFeb 10, 2021

Player Modeling via Multi-Armed Bandits

Robert C. Gray, Jichen Zhu, Dannielle Arigo et al.

This paper focuses on building personalized player models solely from player behavior in the context of adaptive games. We present two main contributions: The first is a novel approach to player modeling based on multi-armed bandits (MABs). This approach addresses, at the same time and in a principled way, both the problem of collecting data to model the characteristics of interest for the current player and the problem of adapting the interactive experience based on this model. Second, we present an approach to evaluating and fine-tuning these algorithms prior to generating data in a user study. This is an important problem, because conducting user studies is an expensive and labor-intensive process; therefore, an ability to evaluate the algorithms beforehand can save a significant amount of resources. We evaluate our approach in the context of modeling players' social comparison orientation (SCO) and present empirical results from both simulations and real players.

5.5LGFeb 10, 2021

Regression Oracles and Exploration Strategies for Short-Horizon Multi-Armed Bandits

Robert C. Gray, Jichen Zhu, Santiago Ontañón

This paper explores multi-armed bandit (MAB) strategies in very short horizon scenarios, i.e., when the bandit strategy is only allowed very few interactions with the environment. This is an understudied setting in the MAB literature with many applications in the context of games, such as player modeling. Specifically, we pursue three different ideas. First, we explore the use of regression oracles, which replace the simple average used in strategies such as epsilon-greedy with linear regression models. Second, we examine different exploration patterns such as forced exploration phases. Finally, we introduce a new variant of the UCB1 strategy called UCBT that has interesting properties and no tunable parameters. We present experimental results in a domain motivated by exergames, where the goal is to maximize a player's daily steps. Our results show that the combination of epsilon-greedy or epsilon-decreasing with regression oracles outperforms all other tested strategies in the short horizon setting.

6.4HCJan 25, 2021

Personalization Paradox in Behavior Change Apps: Lessons from a Social Comparison-Based Personalized App for Physical Activity

Jichen Zhu, Diane H. Dallal, Robert C. Gray et al.

Social comparison-based features are widely used in social computing apps. However, most existing apps are not grounded in social comparison theories and do not consider individual differences in social comparison preferences and reactions. This paper is among the first to automatically personalize social comparison targets. In the context of an m-health app for physical activity, we use artificial intelligence (AI) techniques of multi-armed bandits. Results from our user study (n=53) indicate that there is some evidence that motivation can be increased using the AI-based personalization of social comparison. The detected effects achieved small-to-moderate effect sizes, illustrating the real-world implications of the intervention for enhancing motivation and physical activity. In addition to design implications for social comparison features in social apps, this paper identified the personalization paradox, the conflict between user modeling and adaptation, as a key design challenge of personalized applications for behavior change. Additionally, we propose research directions to mitigate this Personalization Paradox.

6.5LGOct 5, 2020Code

Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games

Shengyi Huang, Santiago Ontañón

Training agents using Reinforcement Learning in games with sparse rewards is a challenging problem, since large amounts of exploration are required to retrieve even the first reward. To tackle this problem, a common approach is to use reward shaping to help exploration. However, an important drawback of reward shaping is that agents sometimes learn to optimize the shaped reward instead of the true objective. In this paper, we present a novel technique that we call action guidance that successfully trains agents to eventually optimize the true objective in games with sparse rewards while maintaining most of the sample efficiency that comes with reward shaping. We evaluate our approach in a simplified real-time strategy (RTS) game simulator called $μ$RTS.

56.3LGJul 28, 2020Code

Big Bird: Transformers for Longer Sequences

Manzil Zaheer, Guru Guruganesh, Avinava Dubey et al.

Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis reveals some of the benefits of having $O(1)$ global tokens (such as CLS), that attend to the entire sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also propose novel applications to genomics data.

5.8HCMay 10, 2020Code

Understanding Learners' Problem-Solving Strategies in Concurrent and Parallel Programming: A Game-Based Approach

Jichen Zhu, Katelyn Alderfer, Brian Smith et al.

Concurrent and parallel programming (CPP) is an increasingly important subject in Computer Science Education. However, the conceptual shift from sequential programming is notoriously difficult to make. Currently, relatively little research exists on how people learn CPP core concepts. This paper presents our results of using Parallel, an educational game about CPP, focusing on the learners' self-efficacy and how they learn CPP concepts. Based on a study of 44 undergraduate students, our research shows that (a) self-efficacy increased significantly after playing the game; (b) the problem-solving strategies employed by students playing the game can be classified in three main types: trial and error, single-thread, and multi-threaded strategies, and (c) that self-efficacy is correlated with the percentage of time students spend in multithreaded problem-solving.

55.1LGApr 17, 2020

ETC: Encoding Long and Structured Inputs in Transformers

Joshua Ainslie, Santiago Ontanon, Chris Alberti et al.

Transformer models have advanced the state of the art in many Natural Language Processing (NLP) tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key challenges of standard Transformer architectures, namely scaling input length and encoding structured inputs. To scale attention to longer inputs, we introduce a novel global-local attention mechanism between global tokens and regular input tokens. We also show that combining global-local attention with relative position encodings and a Contrastive Predictive Coding (CPC) pre-training objective allows ETC to encode structured inputs. We achieve state-of-the-art results on four natural language datasets requiring long and/or structured inputs.

16.4AIFeb 18, 2020

An Overview of Distance and Similarity Functions for Structured Data

Santiago Ontañón

The notions of distance and similarity play a key role in many machine learning approaches, and artificial intelligence (AI) in general, since they can serve as an organizing principle by which individuals classify objects, form concepts and make generalizations. While distance functions for propositional representations have been thoroughly studied, work on distance functions for structured representations, such as graphs, frames or logical clauses, has been carried out in different communities and is much less understood. Specifically, a significant amount of work that requires the use of a distance or similarity function for structured representations of data usually employs ad-hoc functions for specific applications. Therefore, the goal of this paper is to provide an overview of this work to identify connections between the work carried out in different areas and point out directions for future work.

4.8LGOct 26, 2019Code

Comparing Observation and Action Representations for Deep Reinforcement Learning in $μ$RTS

Shengyi Huang, Santiago Ontañón

This paper presents a preliminary study comparing different observation and action space representations for Deep Reinforcement Learning (DRL) in the context of Real-time Strategy (RTS) games. Specifically, we compare two representations: (1) a global representation where the observation represents the whole game state, and the RL agent needs to choose which unit to issue actions to, and which actions to execute; and (2) a local representation where the observation is represented from the point of view of an individual unit, and the RL agent picks actions for each unit independently. We evaluate these representations in $μ$RTS showing that the local representation seems to outperform the global representation when training agents with the task of harvesting resources.

11.9AIAug 15, 2019

Tracing Player Knowledge in a Parallel Programming Educational Game

Pavan Kantharaju, Katelyn Alderfer, Jichen Zhu et al.

This paper focuses on "tracing player knowledge" in educational games. Specifically, given a set of concepts or skills required to master a game, the goal is to estimate the likelihood with which the current player has mastery of each of those concepts or skills. The main contribution of the paper is an approach that integrates machine learning and domain knowledge rules to find when the player applied a certain skill and either succeeded or failed. This is then given as input to a standard knowledge tracing module (such as those from Intelligent Tutoring Systems) to perform knowledge tracing. We evaluate our approach in the context of an educational game called "Parallel" to teach parallel and concurrent programming with data collected from real users, showing our approach can predict students skills with a low mean-squared error.

9.3HCJul 4, 2019

Experience Management in Multi-player Games

Jichen Zhu, Santiago Ontañón

Experience Management studies AI systems that automatically adapt interactive experiences such as games to tailor to specific players and to fulfill design goals. Although it has been explored for several decades, existing work in experience management has mostly focused on single-player experiences. This paper is a first attempt at identifying the main challenges to expand EM to multi-player/multi-user games or experiences. We also make connections to related areas where solutions for similar problems have been proposed (especially group recommender systems) and discusses the potential impact and applications of multi-player EM.

14.2AIOct 13, 2017

Combinatorial Multi-armed Bandits for Real-Time Strategy Games

Santiago Ontañón

Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called {\em naïve sampling}, based on a variant of the Multi-armed Bandit problem called {\em Combinatorial Multi-armed Bandits} (CMAB). We analyze the theoretical properties of several variants of {\em naïve sampling}, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, {\em naïve sampling} outperforms the other sampling strategies.

4.5AIMay 17, 2016

Combat Models for RTS Games

Alberto Uriarte, Santiago Ontañón

Game tree search algorithms, such as Monte Carlo Tree Search (MCTS), require access to a forward model (or "simulator") of the game at hand. However, in some games such forward model is not readily available. This paper presents three forward models for two-player attrition games, which we call "combat models", and show how they can be used to simulate combat in RTS games. We also show how these combat models can be learned from replay data. We use StarCraft as our application domain. We report experiments comparing our combat models predicting a combat output and their impact when used for tactical decisions during a real game.

2.4AIAug 9, 2012

Experiments with Game Tree Search in Real-Time Strategy Games

Santiago Ontanon

Game tree search algorithms such as minimax have been used with enormous success in turn-based adversarial games such as Chess or Checkers. However, such algorithms cannot be directly applied to real-time strategy (RTS) games because a number of reasons. For example, minimax assumes a turn-taking game mechanics, not present in RTS games. In this paper we present RTMM, a real-time variant of the standard minimax algorithm, and discuss its applicability in the context of RTS games. We discuss its strengths and weaknesses, and evaluate it in two real-time games.

2.5NEApr 11, 2012

Automated Generation of Cross-Domain Analogies via Evolutionary Computation

Atilim Gunes Baydin, Ramon Lopez de Mantaras, Santiago Ontanon

Analogy plays an important role in creativity, and is extensively used in science as well as art. In this paper we introduce a technique for the automated generation of cross-domain analogies based on a novel evolutionary algorithm (EA). Unlike existing work in computational analogy-making restricted to creating analogies between two given cases, our approach, for a given case, is capable of creating an analogy along with the novel analogous case itself. Our algorithm is based on the concept of "memes", which are units of culture, or knowledge, undergoing variation and selection under a fitness measure, and represents evolving pieces of knowledge as semantic networks. Using a fitness function based on Gentner's structure mapping theory of analogies, we demonstrate the feasibility of spontaneously generating semantic networks that are analogous to a given base network.