Boyi Liu

h-index13

18papers

658citations

Novelty50%

AI Score34

Ranked #115,301 of 194,257 authors (top 59%)#25,349 in LG (top 63%)

18 Papers

10.7CLOct 10, 2023Code

Let Models Speak Ciphers: Multiagent Debate through Embeddings

Chau Pham, Boyi Liu, Yingxiang Yang et al.

Discussion and debate among Large Language Models (LLMs) have gained considerable attention due to their potential to enhance the reasoning ability of LLMs. Although natural language is an obvious choice for communication due to LLM's language understanding capability, the token sampling step needed when generating natural language poses a potential risk of information loss, as it uses only one token to represent the model's belief across the entire vocabulary. In this paper, we introduce a communication regime named CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue. Specifically, we remove the token sampling step from LLMs and let them communicate their beliefs across the vocabulary through the expectation of the raw transformer output embeddings. Remarkably, by deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights, outperforming the state-of-the-art LLM debate methods using natural language by 0.5-5.0% across five reasoning tasks and multiple open-source LLMs of varying sizes. This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs. We anticipate that CIPHER will inspire further exploration for the design of interactions within LLM agent systems, offering a new direction that could significantly influence future developments in the field.

11.1LGSep 20, 2022

Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL

Fengzhuo Zhang, Boyi Liu, Kaixin Wang et al.

The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transformer implements complex relational reasoning, and we propose and analyze model-free and model-based offline MARL algorithms with the transformer approximators. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents. These results are consequences of a novel generalization error bound of the transformer and a novel analysis of the Maximum Likelihood Estimate (MLE) of the system dynamics with the transformer. Our model-based algorithm is the first provably efficient MARL algorithm that explicitly exploits the permutation invariance of the agents. Our improved generalization bound may be of independent interest and is applicable to other regression problems related to the transformer beyond MARL.

15.6LGDec 30, 2022

An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models

Yufeng Zhang, Boyi Liu, Qi Cai et al.

With the attention mechanism, transformers achieve significant empirical successes. Despite the intuitive understanding that transformers perform relational inference over long sequences to produce desirable representations, we lack a rigorous theory on how the attention mechanism achieves it. In particular, several intriguing questions remain open: (a) What makes a desirable representation? (b) How does the attention mechanism infer the desirable representation within the forward pass? (c) How does a pretraining procedure learn to infer the desirable representation through the backward pass? We observe that, as is the case in BERT and ViT, input tokens are often exchangeable since they already include positional encodings. The notion of exchangeability induces a latent variable model that is invariant to input sizes, which enables our theoretical analysis. - To answer (a) on representation, we establish the existence of a sufficient and minimal representation of input tokens. In particular, such a representation instantiates the posterior distribution of the latent variable given input tokens, which plays a central role in predicting output labels and solving downstream tasks. - To answer (b) on inference, we prove that attention with the desired parameter infers the latent posterior up to an approximation error, which is decreasing in input sizes. In detail, we quantify how attention approximates the conditional mean of the value given the key, which characterizes how it performs relational inference over long sequences. - To answer (c) on learning, we prove that both supervised and self-supervised objectives allow empirical risk minimization to learn the desired parameter up to a generalization error, which is independent of input sizes. Particularly, in the self-supervised setting, we identify a condition number that is pivotal to solving downstream tasks.

3.3MAFeb 20, 2023

Differentiable Arbitrating in Zero-sum Markov Games

Jing Wang, Meichen Song, Feng Gao et al.

We initiate the study of how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating. Such a problem admits a bi-level optimization formulation. The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way. We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level. In particular, our method only requires a black-box solver for the (regularized) Nash equilibrium (NE). We develop the convergence analysis for the proposed framework with proper black-box NE solvers and demonstrate the empirical successes in two multi-agent reinforcement learning (MARL) environments.

1.2MAJan 11, 2023

An Efficient Approach to the Online Multi-Agent Path Finding Problem by Using Sustainable Information

Mingkai Tang, Boyi Liu, Yuanhang Li et al.

Multi-agent path finding (MAPF) is the problem of moving agents to the goal vertex without collision. In the online MAPF problem, new agents may be added to the environment at any time, and the current agents have no information about future agents. The inability of existing online methods to reuse previous planning contexts results in redundant computation and reduces algorithm efficiency. Hence, we propose a three-level approach to solve online MAPF utilizing sustainable information, which can decrease its redundant calculations. The high-level solver, the Sustainable Replan algorithm (SR), manages the planning context and simulates the environment. The middle-level solver, the Sustainable Conflict-Based Search algorithm (SCBS), builds a conflict tree and maintains the planning context. The low-level solver, the Sustainable Reverse Safe Interval Path Planning algorithm (SRSIPP), is an efficient single-agent solver that uses previous planning context to reduce duplicate calculations. Experiments show that our proposed method has significant improvement in terms of computational efficiency. In one of the test scenarios, our algorithm can be 1.48 times faster than SOTA on average under different agent number settings.

9.0LGSep 8, 2020Code

FedCM: A Real-time Contribution Measurement Method for Participants in Federated Learning

Boyi Liu, Bingjie Yan, Yize Zhou et al.

Federated Learning (FL) creates an ecosystem for multiple agents to collaborate on building models with data privacy consideration. The method for contribution measurement of each agent in the FL system is critical for fair credits allocation but few are proposed. In this paper, we develop a real-time contribution measurement method FedCM that is simple but powerful. The method defines the impact of each agent, comprehensively considers the current round and the previous round to obtain the contribution rate of each agent with attention aggregation. Moreover, FedCM updates contribution every round, which enable it to perform in real-time. Real-time is not considered by the existing approaches, but it is critical for FL systems to allocate computing power, communication resources, etc. Compared to the state-of-the-art method, the experimental results show that FedCM is more sensitive to data quantity and data quality under the premise of real-time. Furthermore, we developed federated learning open-source software based on FedCM. The software has been applied to identify COVID-19 based on medical images.

17.9LGJan 31, 2025

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

Han Zhong, Yutong Yin, Shenao Zhang et al.

Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, yet generating reliable reasoning processes remains a significant challenge. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. Within this framework, we introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps. First, it generates high-quality rationales by approximating the optimal thinking process through reinforcement learning, using a novel reward shaping mechanism. Second, it enhances the base LLM by maximizing the joint probability of rationale generation with respect to the model's parameters. Theoretically, we demonstrate BRiTE's convergence at a rate of $1/T$ with $T$ representing the number of iterations. Empirical evaluations on math and coding benchmarks demonstrate that our approach consistently improves performance across different base models without requiring human-annotated thinking processes. In addition, BRiTE demonstrates superior performance compared to existing algorithms that bootstrap thinking processes use alternative methods such as rejection sampling, and can even match or exceed the results achieved through supervised fine-tuning with human-annotated data.

11.3SENov 20, 2024

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

Zhihan Liu, Shenao Zhang, Yongfei Liu et al.

Direct preference learning offers a promising and computation-efficient beyond supervised fine-tuning (SFT) for improving code generation in coding large language models (LMs). However, the scarcity of reliable preference data is a bottleneck for the performance of direct preference learning to improve the coding accuracy of code LMs. In this paper, we introduce \underline{\textbf{D}}irect Preference Learning with Only \underline{\textbf{S}}elf-Generated \underline{\textbf{T}}ests and \underline{\textbf{C}}ode (DSTC), a framework that leverages only self-generated code snippets and tests to construct reliable preference pairs such that direct preference learning can improve LM coding accuracy without external annotations. DSTC combines a minimax selection process and test-code concatenation to improve preference pair quality, reducing the influence of incorrect self-generated tests and enhancing model performance without the need for costly reward models. When applied with direct preference learning methods such as Direct Preference Optimization (DPO) and Kahneman-Tversky Optimization (KTO), DSTC yields stable improvements in coding accuracy (pass@1 score) across diverse coding benchmarks, including HumanEval, MBPP, and BigCodeBench, demonstrating both its effectiveness and scalability for models of various sizes. This approach autonomously enhances code generation accuracy across LLMs of varying sizes, reducing reliance on expensive annotated coding datasets.

2.6LGMar 11, 2024

$\mathbf{(N,K)}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model

Yufeng Zhang, Liyu Chen, Boyi Liu et al.

Recent advances in reinforcement learning (RL) algorithms aim to enhance the performance of language models at scale. Yet, there is a noticeable absence of a cost-effective and standardized testbed tailored to evaluating and comparing these algorithms. To bridge this gap, we present a generalized version of the 24-Puzzle: the $(N,K)$-Puzzle, which challenges language models to reach a target value $K$ with $N$ integers. We evaluate the effectiveness of established RL algorithms such as Proximal Policy Optimization (PPO), alongside novel approaches like Identity Policy Optimization (IPO) and Direct Policy Optimization (DPO).

1.4CVSep 14, 2021

Foreground Object Structure Transfer for Unsupervised Domain Adaptation

Jieren Cheng, Le Liu, Xiangyan Tang et al.

Unsupervised domain adaptation aims to train a classification model from the labeled source domain for the unlabeled target domain. Since the data distributions of the two domains are different, the model often performs poorly on the target domain. Existing methods align the feature distributions of the source and target domains and learn domain-invariant features to improve the performance of the model. However, the features are usually aligned as a whole, and the domain adaptation task fails to serve the classification, which will ignore the class information and lead to misalignment.In this paper, we investigate those features that should be used for domain alignment, introduce prior knowledge to extract foreground features to guide the domain adaptation task for classification tasks, and perform alignment in the local structure of objects. We propose a method called Foreground Object Structure Transfer(FOST). The key to FOST is the new clustering based condition, which combines the relative position relationship of foreground objects. Based on this conditions, FOST makes the data distribution of the same class more compact in geometry. In practice, since the label of the target domain is not available, we use the clustering information of the source domain to assign pseudo labels to the target domain samples, and then according to the source domain data prior knowledge guides those positive features to maximum the inter-class distance between different classes and mimimum the intra-class distance. Extensive experimental results on various benchmarks ($i.e.$ ImageCLEF-DA, Office-31, Office-Home, Visda-2017) under different domain adaptation settings prove that our FOST compares favorably against the existing state-of-the-art domain adaptation methods.

5.4IRMay 23, 2020

COVID-19 Public Opinion and Emotion Monitoring System Based on Time Series Thermal New Word Mining

Yixian Zhang, Jieren Chen, Boyi Liu et al.

With the spread and development of new epidemics, it is of great reference value to identify the changing trends of epidemics in public emotions. We designed and implemented the COVID-19 public opinion monitoring system based on time series thermal new word mining. A new word structure discovery scheme based on the timing explosion of network topics and a Chinese sentiment analysis method for the COVID-19 public opinion environment is proposed. Establish a "Scrapy-Redis-Bloomfilter" distributed crawler framework to collect data. The system can judge the positive and negative emotions of the reviewer based on the comments, and can also reflect the depth of the seven emotions such as Hopeful, Happy, and Depressed. Finally, we improved the sentiment discriminant model of this system and compared the sentiment discriminant error of COVID-19 related comments with the Jiagu deep learning model. The results show that our model has better generalization ability and smaller discriminant error. We designed a large data visualization screen, which can clearly show the trend of public emotions, the proportion of various emotion categories, keywords, hot topics, etc., and fully and intuitively reflect the development of public opinion.

7.0ROMar 2, 2020

Design and Implementation of A Novel Precision Irrigation Robot Based on An Intelligent Path Planning Algorithm

Minghan Chen, Yilong Sun, Xueqing Cai et al.

The agricultural irrigation system is closely related to agricultural production. There are some problems in nowadays agricultural irrigation system, such as poor mobility, imprecision and high price. To address these issues, an intelligent irrigation robot is designed and implemented in this work. The robot achieves precise irrigation by the irrigation path planning algorithm which is improved by Bayesian theory. In the proposed algorithm, we utilize as much information as possible to achieve full coverage irrigation in the complex agricultural environment. Besides, we propose the maximum risk to avoid the problem of lack of inspection in certain areas. Finally, We carried out simulation experiments and field experiments to verify the robot and the algorithm. The experimental results indicate that the robot is capable of fulfilling the requirements of various agricultural irrigation tasks.

19.9RODec 24, 2019

Federated Imitation Learning: A Novel Framework for Cloud Robotic Systems with Heterogeneous Sensor Data

Boyi Liu, Lujia Wang, Ming Liu et al.

Humans are capable of learning a new behavior by observing others to perform the skill. Similarly, robots can also implement this by imitation learning. Furthermore, if with external guidance, humans can master the new behavior more efficiently. So, how can robots achieve this? To address the issue, we present a novel framework named FIL. It provides a heterogeneous knowledge fusion mechanism for cloud robotic systems. Then, a knowledge fusion algorithm in FIL is proposed. It enables the cloud to fuse heterogeneous knowledge from local robots and generate guide models for robots with service requests. After that, we introduce a knowledge transfer scheme to facilitate local robots acquiring knowledge from the cloud. With FIL, a robot is capable of utilizing knowledge from other robots to increase its imitation learning in accuracy and efficiency. Compared with transfer learning and meta-learning, FIL is more suitable to be deployed in cloud robotic systems. Finally, we conduct experiments of a self-driving task for robots (cars). The experimental results demonstrate that the shared model generated by FIL increases imitation learning efficiency of local robots in cloud robotic systems.

14.8ROSep 3, 2019

Federated Imitation Learning: A Privacy Considered Imitation Learning Framework for Cloud Robotic Systems with Heterogeneous Sensor Data

Boyi Liu, Lujia Wang, Ming Liu et al.

Humans are capable of learning a new behavior by observing others perform the skill. Robots can also implement this by imitation learning. Furthermore, if with external guidance, humans will master the new behavior more efficiently. So how can robots implement this? To address the issue, we present Federated Imitation Learning (FIL) in the paper. Firstly, a knowledge fusion algorithm deployed on the cloud for fusing knowledge from local robots is presented. Then, effective transfer learning methods in FIL are introduced. With FIL, a robot is capable of utilizing knowledge from other robots to increase its imitation learning. FIL considers information privacy and data heterogeneity when robots share knowledge. It is suitable to be deployed in cloud robotic systems. Finally, we conduct experiments of a simplified self-driving task for robots (cars). The experimental results demonstrate that FIL is capable of increasing imitation learning of local robots in cloud robotic systems.

3.4LGJun 25, 2019

Traffic Flow Combination Forecasting Method Based on Improved LSTM and ARIMA

Boyi Liu, Xiangyan Tang, Jieren Cheng et al.

Traffic flow forecasting is hot spot research of intelligent traffic system construction. The existing traffic flow prediction methods have problems such as poor stability, high data requirements, or poor adaptability. In this paper, we define the traffic data time singularity ratio in the dropout module and propose a combination prediction method based on the improved long short-term memory neural network and time series autoregressive integrated moving average model (SDLSTM-ARIMA), which is derived from the Recurrent Neural Networks (RNN) model. It compares the traffic data time singularity with the probability value in the dropout module and combines them at unequal time intervals to achieve an accurate prediction of traffic flow data. Then, we design an adaptive traffic flow embedded system that can adapt to Java, Python and other languages and other interfaces. The experimental results demonstrate that the method based on the SDLSTM - ARIMA model has higher accuracy than the similar method using only autoregressive integrated moving average or autoregressive. Our embedded traffic prediction system integrating computer vision, machine learning and cloud has the advantages such as high accuracy, high reliability and low cost. Therefore, it has a wide application prospect.

23.7LGJun 25, 2019

Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

Boyi Liu, Qi Cai, Zhuoran Yang et al.

Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning. However, due to nonconvexity, the global convergence of PPO and TRPO remains less understood, which separates theory from practice. In this paper, we prove that a variant of PPO and TRPO equipped with overparametrized neural networks converges to the globally optimal policy at a sublinear rate. The key to our analysis is the global convergence of infinite-dimensional mirror descent under a notion of one-point monotonicity, where the gradient and iterate are instantiated by neural networks. In particular, the desirable representation power and optimization geometry induced by the overparametrization of such neural networks allow them to accurately approximate the infinite-dimensional gradient and iterate.

4.9ROMar 26, 2019

Recognition of Pyralidae Insects Using Intelligent Monitoring Autonomous Robot Vehicle in Natural Farm Scene

Boyi Liu, Zhuhua Hu, Yaochi Zhao et al.

The Pyralidae pests, such as corn borer and rice leaf roller, are main pests in economic crops. The timely detection and identification of Pyralidae pests is a critical task for agriculturists and farmers. However, the traditional identification of pests by humans is labor intensive and inefficient. To tackle the challenges, a pest monitoring autonomous robot vehicle and a method to recognize Pyralidae pests are presented in this paper. Firstly, the robot on autonomous vehicle collects images by performing camera sensing in natural farm scene. Secondly, the total probability image can be obtained by using inverse histogram mapping, and then the object contour of Pyralidae pests can be extracted quickly and accurately with the constrained Otsu method. Finally, by employing Hu moment and the perimeter and area characteristics, the correct contours of objects can be drawn, and the recognition results can be obtained by comparing them with the reference templates of Pyralidae pests. Additionally, the moving speed of the mechanical arms on the vehicle can be adjusted adaptively by interacting with the recognition algorithm. The experimental results demonstrate that the robot vehicle can automatically capture pest images, and can achieve 94.3$\%$ recognition accuracy in natural farm planting scene.

33.3ROJan 19, 2019

Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

Boyi Liu, Lujia Wang, Ming Liu

This paper was motivated by the problem of how to make robots fuse and transfer their experience so that they can effectively use prior knowledge and quickly adapt to new environments. To address the problem, we present a learning architecture for navigation in cloud robotic systems: Lifelong Federated Reinforcement Learning (LFRL). In the work, We propose a knowledge fusion algorithm for upgrading a shared model deployed on the cloud. Then, effective transfer learning methods in LFRL are introduced. LFRL is consistent with human cognitive science and fits well in cloud robotic systems. Experiments show that LFRL greatly improves the efficiency of reinforcement learning for robot navigation. The cloud robotic system deployment also shows that LFRL is capable of fusing prior knowledge. In addition, we release a cloud robotic navigation-learning website based on LFRL.