CLAug 27, 2025
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement LearningSikuan Yan, Xiufeng Yang, Zuchao Huang et al.
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B-14B).
AIJun 18, 2020
Practical Massively Parallel Monte-Carlo Tree Search Applied to Molecular DesignXiufeng Yang, Tanuj Kr Aasawat, Kazuki Yoshizoe
It is common practice to use large computational resources to train neural networks, as is known from many examples, such as reinforcement learning applications. However, while massively parallel computing is often used for training models, it is rarely used for searching solutions for combinatorial optimization problems. In this paper, we propose a novel massively parallel Monte-Carlo Tree Search (MP-MCTS) algorithm that works efficiently for 1,000 worker scale, and apply it to molecular design. This is the first work that applies distributed MCTS to a real-world and non-game problem. Existing work on large-scale parallel MCTS show efficient scalability in terms of the number of rollouts up to 100 workers, but suffer from the degradation in the quality of the solutions. MP-MCTS maintains the search quality at larger scale, and by running MP-MCTS on 256 CPU cores for only 10 minutes, we obtained candidate molecules having similar score to non-parallel MCTS running for 42 hours. Moreover, our results based on parallel MCTS (combined with a simple RNN model) significantly outperforms existing state-of-the-art work. Our method is generic and is expected to speed up other applications of MCTS.
ROMay 6, 2019
Bee$^+$: A 95-mg Four-Winged Insect-Scale Flying Robot Driven by Twinned Unimorph ActuatorsXiufeng Yang, Ying Chen, Longlong Chang et al.
We introduce Bee$^+$, a 95-mg four-winged microrobot with improved controllability and open-loop-response characteristics with respect to those exhibited by state-of-the-art two-winged microrobots with the same size and similar weight (i.e., the 75-mg Harvard RoboBee). The key innovation that made possible the development of Bee$^+$ is the introduction of an extremely light (28-mg) pair of twinned unimorph actuators, which enabled the design of a new microrobotic mechanism that flaps four wings independently. A first main advantage of the proposed design, compared to those of two-winged flyers, is that by increasing the number of actuators from two to four, the number of direct control inputs increases from three to four when simple sinusoidal excitations are employed. A second advantage of Bee$^+$ is that its four-wing configuration and flapping mode naturally damp the rotational disturbances that commonly affect the yaw degree of freedom of two-winged microrobots. In addition, the proposed design greatly reduces the complexity of the associated fabrication process compared to those of other microrobots, as the unimorph actuators are fairly easy to build. Lastly, we hypothesize that given the relatively low wing-loading affecting their flapping mechanisms, the life expectancy of Bee$^+$s must be considerably higher than those of the two-winged counterparts. The functionality and basic capabilities of the robot are demonstrated through a set of simple control experiments.