Kazuhiro Ota

LGJun 2, 2022

Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes

Tetsuro Morimura, Kazuhiro Ota, Kenshi Abe et al.

Policy gradient (PG) is a reinforcement learning (RL) approach that optimizes a parameterized policy model for an expected return using gradient ascent. While PG can work well even in non-Markovian environments, it may encounter plateaus or peakiness issues. As another successful RL approach, algorithms based on Monte Carlo Tree Search (MCTS), which include AlphaZero, have obtained groundbreaking results, especially in the game-playing domain. They are also effective when applied to non-Markov decision processes. However, the standard MCTS is a method for decision-time planning, which differs from the online RL setting. In this work, we first introduce Monte Carlo Tree Learning (MCTL), an adaptation of MCTS for online RL setups. We then explore a combined policy approach of PG and MCTL to leverage their strengths. We derive conditions for asymptotic convergence with the results of a two-timescale stochastic approximation and propose an algorithm that satisfies these conditions and converges to a reasonable solution. Our numerical experiments validate the effectiveness of the proposed methods.

LGMay 20, 2021

Aggregate Learning for Mixed Frequency Data

Takamichi Toda, Daisuke Moriwaki, Kazuhiro Ota

Large and acute economic shocks such as the 2007-2009 financial crisis and the current COVID-19 infections rapidly change the economic environment. In such a situation, the importance of real-time economic analysis using alternative datais emerging. Alternative data such as search query and location data are closer to real-time and richer than official statistics that are typically released once a month in an aggregated form. We take advantage of spatio-temporal granularity of alternative data and propose a mixed-FrequencyAggregate Learning (MF-AGL)model that predicts economic indicators for the smaller areas in real-time. We apply the model for the real-world problem; prediction of the number of job applicants which is closely related to the unemployment rates. We find that the proposed model predicts (i) the regional heterogeneity of the labor market condition and (ii) the rapidly changing economic status. The model can be applied to various tasks, especially economic analysis

Kazuhiro Ota

2 Papers