LGOct 29, 2022
BIMRL: Brain Inspired Meta Reinforcement LearningSeyed Roozbeh Razavi Rohani, Saeed Hedayatian, Mahdieh Soleymani Baghshah
Sample efficiency has been a key issue in reinforcement learning (RL). An efficient agent must be able to leverage its prior experiences to quickly adapt to similar, but new tasks and situations. Meta-RL is one attempt at formalizing and addressing this issue. Inspired by recent progress in meta-RL, we introduce BIMRL, a novel multi-layer architecture along with a novel brain-inspired memory module that will help agents quickly adapt to new tasks within a few episodes. We also utilize this memory module to design a novel intrinsic reward that will guide the agent's exploration. Our architecture is inspired by findings in cognitive neuroscience and is compatible with the knowledge on connectivity and functionality of different regions in the brain. We empirically validate the effectiveness of our proposed method by competing with or surpassing the performance of some strong baselines on multiple MiniGrid environments.
LGNov 30, 2025
Soft Quality-Diversity OptimizationSaeed Hedayatian, Stefanos Nikolaidis
Quality-Diversity (QD) algorithms constitute a branch of optimization that is concerned with discovering a diverse and high-quality set of solutions to an optimization problem. Current QD methods commonly maintain diversity by dividing the behavior space into discrete regions, ensuring that solutions are distributed across different parts of the space. The QD problem is then solved by searching for the best solution in each region. This approach to QD optimization poses challenges in large solution spaces, where storing many solutions is impractical, and in high-dimensional behavior spaces, where discretization becomes ineffective due to the curse of dimensionality. We present an alternative framing of the QD problem, called \emph{Soft QD}, that sidesteps the need for discretizations. We validate this formulation by demonstrating its desirable properties, such as monotonicity, and by relating its limiting behavior to the widely used QD Score metric. Furthermore, we leverage it to derive a novel differentiable QD algorithm, \emph{Soft QD Using Approximated Diversity (SQUAD)}, and demonstrate empirically that it is competitive with current state of the art methods on standard benchmarks while offering better scalability to higher dimensional problems.
LGJun 5, 2025
AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity OptimizationSaeed Hedayatian, Stefanos Nikolaidis
Quality-Diversity (QD) algorithms have shown remarkable success in discovering diverse, high-performing solutions, but rely heavily on hand-crafted behavioral descriptors that constrain exploration to predefined notions of diversity. Leveraging the equivalence between policies and occupancy measures, we present a theoretically grounded approach to automatically generate behavioral descriptors by embedding the occupancy measures of policies in Markov Decision Processes. Our method, AutoQD, leverages random Fourier features to approximate the Maximum Mean Discrepancy (MMD) between policy occupancy measures, creating embeddings whose distances reflect meaningful behavioral differences. A low-dimensional projection of these embeddings that captures the most behaviorally significant dimensions is then used as behavioral descriptors for off-the-shelf QD methods. We prove that our embeddings converge to true MMD distances between occupancy measures as the number of sampled trajectories and embedding dimensions increase. Through experiments in multiple continuous control tasks we demonstrate AutoQD's ability in discovering diverse policies without predefined behavioral descriptors, presenting a well-motivated alternative to prior methods in unsupervised Reinforcement Learning and QD optimization. Our approach opens new possibilities for open-ended learning and automated behavior discovery in sequential decision making settings without requiring domain-specific knowledge.