CLNov 29, 2022Code
DiffG-RL: Leveraging Difference between State and Common SenseTsunehiko Tanaka, Daiki Kimura, Michiaki Tatsubori
Taking into account background knowledge as the context has always been an important part of solving tasks that involve natural language. One representative example of such tasks is text-based games, where players need to make decisions based on both description text previously shown in the game, and their own background knowledge about the language and common sense. In this work, we investigate not simply giving common sense, as can be seen in prior research, but also its effective usage. We assume that a part of the environment states different from common sense should constitute one of the grounds for action selection. We propose a novel agent, DiffG-RL, which constructs a Difference Graph that organizes the environment states and common sense by means of interactive objects with a dedicated graph encoder. DiffG-RL also contains a framework for extracting the appropriate amount and representation of common sense from the source to support the construction of the graph. We validate DiffG-RL in experiments with text-based games that require common sense and show that it outperforms baselines by 17% of scores. The code is available at https://github.com/ibm/diffg-rl
AIJul 24, 2024Code
Grammar-based Game Description Generation using Large Language ModelsTsunehiko Tanaka, Edgar Simo-Serra
Game Description Language (GDL) provides a standardized way to express diverse games in a machine-readable format, enabling automated game simulation, and evaluation. While previous research has explored game description generation using search-based methods, generating GDL descriptions from natural language remains a challenging task. This paper presents a novel framework that leverages Large Language Models (LLMs) to generate grammatically accurate game descriptions from natural language. Our approach consists of two stages: first, we gradually generate a minimal grammar based on GDL specifications; second, we iteratively improve the game description through grammar-guided generation. Our framework employs a specialized parser that identifies valid subsequences and candidate symbols from LLM responses, enabling gradual refinement of the output to ensure grammatical correctness. Experimental results demonstrate that our iterative improvement approach significantly outperforms baseline methods that directly use LLM outputs. Our code is available at https://github.com/tsunehiko/ggdg
CVOct 19, 2022
Commonsense Knowledge from Scene Graphs for Textual EnvironmentsTsunehiko Tanaka, Daiki Kimura, Michiaki Tatsubori
Text-based games are becoming commonly used in reinforcement learning as real-world simulation environments. They are usually imperfect information games, and their interactions are only in the textual modality. To challenge these games, it is effective to complement the missing information by providing knowledge outside the game, such as human common sense. However, such knowledge has only been available from textual information in previous works. In this paper, we investigate the advantage of employing commonsense reasoning obtained from visual datasets such as scene graph datasets. In general, images convey more comprehensive information compared with text for humans. This property enables to extract commonsense relationship knowledge more useful for acting effectively in a game. We compare the statistics of spatial relationships available in Visual Genome (a scene graph dataset) and ConceptNet (a text-based knowledge) to analyze the effectiveness of introducing scene graph datasets. We also conducted experiments on a text-based game task that requires commonsense reasoning. Our experimental results demonstrated that our proposed methods have higher and competitive performance than existing state-of-the-art methods.
LGFeb 6, 2024Code
Return-Aligned Decision TransformerTsunehiko Tanaka, Kenshi Abe, Kaito Ariu et al.
Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. It is increasingly important to adjust the performance of AI agents to meet human requirements, for example, in applications like video games and education tools. Decision Transformer (DT) optimizes a policy that generates actions conditioned on the target return through supervised learning and includes a mechanism to control the agent's performance using the target return. However, the action generation is hardly influenced by the target return because DT's self-attention allocates scarce attention scores to the return tokens. In this paper, we propose Return-Aligned Decision Transformer (RADT), designed to more effectively align the actual return with the target return. RADT leverages features extracted by paying attention solely to the return, enabling action generation to consistently depend on the target return. Extensive experiments show that RADT significantly reduces the discrepancies between the actual return and the target return compared to DT-based methods. Our code is available at https://github.com/CyberAgentAILab/radt
CLMar 20, 2025Code
Grammar and Gameplay-aligned RL for Game Description Generation with LLMsTsunehiko Tanaka, Edgar Simo-Serra
Game Description Generation (GDG) is the task of generating a game description written in a Game Description Language (GDL) from natural language text. Previous studies have explored generation methods leveraging the contextual understanding capabilities of Large Language Models (LLMs); however, accurately reproducing the game features of the game descriptions remains a challenge. In this paper, we propose reinforcement learning-based fine-tuning of LLMs for GDG (RLGDG). Our training method simultaneously improves grammatical correctness and fidelity to game concepts by introducing both grammar rewards and concept rewards. Furthermore, we adopt a two-stage training strategy where Reinforcement Learning (RL) is applied following Supervised Fine-Tuning (SFT). Experimental results demonstrate that our proposed method significantly outperforms baseline methods using SFT alone. Our code is available at https://github.com/tsunehiko/rlgdg