Henry Kuo

MLJul 10, 2023

Loss Dynamics of Temporal Difference Reinforcement Learning

Blake Bordelon, Paul Masset, Henry Kuo et al.

Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.

NEAug 5, 2019

Reusability and Transferability of Macro Actions for Reinforcement Learning

Yi-Hsiang Chang, Kuan-Yu Chang, Henry Kuo et al.

Conventional reinforcement learning (RL) typically determines an appropriate primitive action at each timestep. However, by using a proper macro action, defined as a sequence of primitive actions, an agent is able to bypass intermediate states to a farther state and facilitate its learning procedure. The problem we would like to investigate is what associated beneficial properties that macro actions may possess. In this paper, we unveil the properties of reusability and transferability of macro actions. The first property, reusability, means that a macro action generated along with one RL method can be reused by another RL method for training, while the second one, transferability, means that a macro action can be utilized for training agents in similar environments with different reward settings. In our experiments, we first generate macro actions along with RL methods. We then provide a set of analyses to reveal the properties of reusability and transferability of the generated macro actions.

Henry Kuo

2 Papers