LGSep 19, 2022
MAN: Multi-Action Networks LearningKeqin Wang, Alison Bartsch, Amir Barati Farimani
Learning control policies with large discrete action spaces is a challenging problem in the field of reinforcement learning due to present inefficiencies in exploration. With high dimensional action spaces, there are a large number of potential actions in each individual dimension over which policies would be learned. In this work, we introduce a Deep Reinforcement Learning (DRL) algorithm call Multi-Action Networks (MAN) Learning that addresses the challenge of high-dimensional large discrete action spaces. We propose factorizing the N-dimension action space into N 1-dimensional components, known as sub-actions, creating a Value Neural Network for each sub-action. Then, MAN uses temporal-difference learning to train the networks synchronously, which is simpler than training a single network with a large action output directly. To evaluate the proposed method, we test MAN on three scenarios: an n-dimension maze task, a block stacking task, and then extend MAN to handle 12 games from the Atari Arcade Learning environment with 18 action spaces. Our results indicate that MAN learns faster than both Deep Q-Learning and Double Deep Q-Learning, implying our method is a better performing synchronous temporal difference algorithm than those currently available for large discrete action spaces.
LGJan 31, 2025
Resolving Oversmoothing with Opinion DissensusKeqin Wang, Yulong Yang, Ishan Saha et al.
While graph neural networks (GNNs) have allowed researchers to successfully apply neural networks to non-Euclidean domains, deep GNNs often exhibit lower predictive performance than their shallow counterparts. This phenomena has been attributed in part to oversmoothing, the tendency of node representations to become increasingly similar with network depth. In this paper we introduce an analogy between oversmoothing in GNNs and consensus (i.e., perfect agreement) in the opinion dynamics literature. We show that the message passing algorithms of several GNN models are equivalent to linear opinion dynamics models which have been shown to converge to consensus for all inputs regardless of the graph structure. This new perspective on oversmoothing motivates the use of nonlinear opinion dynamics as an inductive bias in GNN models. In our Behavior-Inspired Message Passing (BIMP) GNN, we leverage the nonlinear opinion dynamics model which is more general than the linear opinion dynamics model, and can be designed to converge to dissensus for general inputs. Through extensive experiments we show that BIMP resists oversmoothing beyond 100 time steps and consistently outperforms existing architectures even when those architectures are amended with oversmoothing mitigation techniques. We also show that BIMP has several desirable properties including well behaved gradients and adaptability to homophilic and heterophilic datasets.
LGJun 20, 2024
Behavior-Inspired Neural Networks for Relational InferenceYulong Yang, Bowen Feng, Keqin Wang et al.
From pedestrians to Kuramoto oscillators, interactions between agents govern how dynamical systems evolve in space and time. Discovering how these agents relate to each other has the potential to improve our understanding of the often complex dynamics that underlie these systems. Recent works learn to categorize relationships between agents based on observations of their physical behavior. These approaches model relationship categories as outcomes of a categorical distribution which is limiting and contrary to real-world systems, where relationship categories often intermingle and interact. In this work, we introduce a level of abstraction between the observable behavior of agents and the latent categories that determine their behavior. To do this, we learn a mapping from agent observations to agent preferences for a set of latent categories. The learned preferences and inter-agent proximity are integrated in a nonlinear opinion dynamics model, which allows us to naturally identify mutually exclusive categories, predict an agent's evolution in time, and control an agent's behavior. Through extensive experiments, we demonstrate the utility of our model for learning interpretable categories, and the efficacy of our model for long-horizon trajectory prediction.