0.8CLJun 4
Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systemsYingzhuo Liu
Multi-agent systems built on large language models (LLMs) have become a prevailing paradigm for tackling complex reasoning, planning, and tool-use tasks. The dominant communication protocol in such systems is natural language: agents exchange messages token-by-token, verbalising their internal reasoning so that peers can read, verify, and respond. While convenient and interpretable, this protocol suffers from three structural drawbacks -- high inference cost, irreversible information loss during discretization, and ambiguity/redundancy of natural language. A growing body of work therefore explores an alternative protocol -- latent communication -- in which agents exchange continuous representations (embeddings, hidden states, or KV-caches) directly, bypassing the bottleneck of text generation. This paper presents a unified framework for organising the rapidly expanding literature on latent communication. We analyse existing methods along three orthogonal axes: (1) WHAT information is communicated (Embeddings, Hidden States, KV-Caches, or other continuous state); (2) WHICH sender-receiver alignment is used (latent-space alignment and layer alignment); and (3) HOW the communicated information is fused into the receiver (concatenation, prepending, mathematical operations, cross-attention, or cache restoration). Under this 3-axis framework, we systematically categorise eighteen representative methods proposed between 2024 and 2026, identify five major design patterns, and surface a set of open challenges -- including cross-architecture alignment, security of latent channels, compression for edge deployment, and the relationship between latent communication and latent chain-of-thought. We hope that this framework both lowers the barrier to entry for new researchers and provides a vocabulary for comparing future work.
MADec 30, 2025Code
Simulation-Free PSRO: Removing Game Simulation from Policy Space Response OraclesYingzhuo Liu, Shuodi Liu, Weijun Luo et al.
Policy Space Response Oracles (PSRO) combines game-theoretic equilibrium computation with learning and is effective in approximating Nash Equilibrium in zero-sum games. However, the computational cost of PSRO has become a significant limitation to its practical application. Our analysis shows that game simulation is the primary bottleneck in PSRO's runtime. To address this issue, we conclude the concept of Simulation-Free PSRO and summarize existing methods that instantiate this concept. Additionally, we propose a novel Dynamic Window-based Simulation-Free PSRO, which introduces the concept of a strategy window to replace the original strategy set maintained in PSRO. The number of strategies in the strategy window is limited, thereby simplifying opponent strategy selection and improving the robustness of the best response. Moreover, we use Nash Clustering to select the strategy to be eliminated, ensuring that the number of strategies within the strategy window is effectively limited. Our experiments across various environments demonstrate that the Dynamic Window mechanism significantly reduces exploitability compared to existing methods, while also exhibiting excellent compatibility. Our code is available at https://github.com/enochliu98/SF-PSRO.
CLOct 20, 2025Code
Select-Then-Decompose: From Empirical Analysis to Adaptive Selection Strategy for Task Decomposition in Large Language ModelsShuodi Liu, Yingzhuo Liu, Zi Wang et al.
Large language models (LLMs) have demonstrated remarkable reasoning and planning capabilities, driving extensive research into task decomposition. Existing task decomposition methods focus primarily on memory, tool usage, and feedback mechanisms, achieving notable success in specific domains, but they often overlook the trade-off between performance and cost. In this study, we first conduct a comprehensive investigation on task decomposition, identifying six categorization schemes. Then, we perform an empirical analysis of three factors that influence the performance and cost of task decomposition: categories of approaches, characteristics of tasks, and configuration of decomposition and execution models, uncovering three critical insights and summarizing a set of practical principles. Building on this analysis, we propose the Select-Then-Decompose strategy, which establishes a closed-loop problem-solving process composed of three stages: selection, execution, and verification. This strategy dynamically selects the most suitable decomposition approach based on task characteristics and enhances the reliability of the results through a verification module. Comprehensive evaluations across multiple benchmarks show that the Select-Then-Decompose consistently lies on the Pareto frontier, demonstrating an optimal balance between performance and cost. Our code is publicly available at https://github.com/summervvind/Select-Then-Decompose.
LGMar 11, 2024
RL-MSA: a Reinforcement Learning-based Multi-line bus Scheduling ApproachYingzhuo Liu
Multiple Line Bus Scheduling Problem (MLBSP) is vital to save operational cost of bus company and guarantee service quality for passengers. Existing approaches typically generate a bus scheduling scheme in an offline manner and then schedule buses according to the scheme. In practice, uncertain events such as traffic congestion occur frequently, which may make the pre-determined bus scheduling scheme infeasible. In this paper, MLBSP is modeled as a Markov Decision Process (MDP). A Reinforcement Learning-based Multi-line bus Scheduling Approach (RL-MSA) is proposed for bus scheduling at both the offline and online phases. At the offline phase, deadhead decision is integrated into bus selection decision for the first time to simplify the learning problem. At the online phase, deadhead decision is made through a time window mechanism based on the policy learned at the offline phase. We develop several new and useful state features including the features for control points, bus lines and buses. A bus priority screening mechanism is invented to construct bus-related features. Considering the interests of both the bus company and passengers, a reward function combining the final reward and the step-wise reward is devised. Experiments at the offline phase demonstrate that the number of buses used of RL-MSA is decreased compared with offline optimization approaches. At the online phase, RL-MSA can cover all departure times in a timetable (i.e., service quality) without increasing the number of buses used (i.e., operational cost).