Dawei Gao

LG
h-index25
25papers
1,641citations
Novelty44%
AI Score49

25 Papers

DBAug 29, 2023Code
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation

Dawei Gao, Haibin Wang, Yaliang Li et al.

Large language models (LLMs) have emerged as a new paradigm for Text-to-SQL task. However, the absence of a systematical benchmark inhibits the development of designing effective, efficient and economic LLM-based Text-to-SQL solutions. To address this challenge, in this paper, we first conduct a systematical and extensive comparison over existing prompt engineering methods, including question representation, example selection and example organization, and with these experimental results, we elaborate their pros and cons. Based on these findings, we propose a new integrated solution, named DAIL-SQL, which refreshes the Spider leaderboard with 86.6% execution accuracy and sets a new bar. To explore the potential of open-source LLM, we investigate them in various scenarios, and further enhance their performance with supervised fine-tuning. Our explorations highlight open-source LLMs' potential in Text-to-SQL, as well as the advantages and disadvantages of the supervised fine-tuning. Additionally, towards an efficient and economic LLM-based Text-to-SQL solution, we emphasize the token efficiency in prompt engineering and compare the prior studies under this metric. We hope that our work provides a deeper understanding of Text-to-SQL with LLMs, and inspires further investigations and broad applications.

LGApr 11, 2022Code
FederatedScope: A Flexible Federated Learning Platform for Heterogeneity

Yuexiang Xie, Zhen Wang, Dawei Gao et al.

Although remarkable progress has been made by existing federated learning (FL) platforms to provide infrastructures for development, these platforms may not well tackle the challenges brought by various types of heterogeneity, including the heterogeneity in participants' local data, resources, behaviors and learning goals. To fill this gap, in this paper, we propose a novel FL platform, named FederatedScope, which employs an event-driven architecture to provide users with great flexibility to independently describe the behaviors of different participants. Such a design makes it easy for users to describe participants with various local training processes, learning goals and backends, and coordinate them into an FL course with synchronous or asynchronous training strategies. Towards an easy-to-use and flexible platform, FederatedScope enables rich types of plug-in operations and components for efficient further development, and we have implemented several important components to better help users with privacy protection, attack simulation and auto-tuning. We have released FederatedScope at https://github.com/alibaba/FederatedScope to promote academic research and industrial deployment of federated learning in a wide range of scenarios.

LGJun 8, 2022Code
pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning

Daoyuan Chen, Dawei Gao, Weirui Kuang et al.

Personalized Federated Learning (pFL), which utilizes and deploys distinct local models, has gained increasing attention in recent years due to its success in handling the statistical heterogeneity of FL clients. However, standardized evaluation and systematical analysis of diverse pFL methods remain a challenge. Firstly, the highly varied datasets, FL simulation settings and pFL implementations prevent easy and fair comparisons of pFL methods. Secondly, the current pFL literature diverges in the adopted evaluation and ablation protocols. Finally, the effectiveness and robustness of pFL methods are under-explored in various practical scenarios, such as the generalization to new clients and the participation of resource-limited clients. To tackle these challenges, we propose the first comprehensive pFL benchmark, pFL-Bench, for facilitating rapid, reproducible, standardized and thorough pFL evaluation. The proposed benchmark contains more than 10 dataset variants in various application domains with a unified data partition and realistic heterogeneous settings; a modularized and easy-to-extend pFL codebase with more than 20 competitive pFL method implementations; and systematic evaluations under containerized environments in terms of generalization, fairness, system overhead, and convergence. We highlight the benefits and potential of state-of-the-art pFL methods and hope the pFL-Bench enables further pFL research and broad applications that would otherwise be difficult owing to the absence of a dedicated benchmark. The code is released at https://github.com/alibaba/FederatedScope/tree/master/benchmark/pFL-Bench.

LGSep 5, 2023Code
Data-Juicer: A One-Stop Data Processing System for Large Language Models

Daoyuan Chen, Yilun Huang, Zhijian Ma et al.

The immense evolution in Large Language Models (LLMs) has underscored the importance of massive, heterogeneous, and high-quality data. A data recipe is a mixture of data from different sources for training LLMs, which plays a vital role in LLMs' performance. Existing open-source tools for LLM data processing are mostly tailored for specific data recipes. To continuously uncover the potential of LLMs, incorporate data from new sources, and improve LLMs' performance, we build a new system named Data-Juicer, with which we can efficiently generate diverse data recipes, explore different possibilities in forming data mixtures, and evaluate their effects on model performance. Different from traditional data-analytics pipelines, Data-Juicer faces some unique challenges. Firstly, the possible data sources for forming data recipes are truly heterogeneous and massive with various qualities. Secondly, it is extremely expensive to precisely evaluate data recipes' impact on LLMs' performance. Thirdly, the end users of Data-Juicer, model developers, need sufficient flexibility to configure and evaluate different data recipes. Data-Juicer features a fine-grained abstraction of pipelines for constructing data recipes, with over 50 built-in operators for easy composition and extension. By incorporating visualization and auto-evaluation capabilities, Data-Juicer enables a timely feedback loop for both LLM pre-training and fine-tuning. Further, Data-Juicer is optimized and integrated with ecosystems for LLM training, evaluation, and distributed computing. The data recipes derived with Data-Juicer gain notable improvements on state-of-the-art LLMs, by up to 7.45% increase in averaged score across 16 LLM benchmarks and 17.5% higher win rate in pair-wise GPT-4 evaluations. Our system, data recipes, and tutorials are released, calling for broader data-centric research on training and understanding LLMs.

LGJun 7, 2022Code
A Benchmark for Federated Hetero-Task Learning

Liuyi Yao, Dawei Gao, Zhen Wang et al.

To investigate the heterogeneity in federated learning in real-world scenarios, we generalize the classic federated learning to federated hetero-task learning, which emphasizes the inconsistency across the participants in federated learning in terms of both data distribution and learning tasks. We also present B-FHTL, a federated hetero-task learning benchmark consisting of simulation dataset, FL protocols and a unified evaluation mechanism. B-FHTL dataset contains three well-designed federated learning tasks with increasing heterogeneity. Each task simulates the clients with different non-IID data and learning tasks. To ensure fair comparison among different FL algorithms, B-FHTL builds in a full suite of FL protocols by providing high-level APIs to avoid privacy leakage, and presets most common evaluation metrics spanning across different learning tasks, such as regression, classification, text generation and etc. Furthermore, we compare the FL algorithms in fields of federated multi-task learning, federated personalization and federated meta learning within B-FHTL, and highlight the influence of heterogeneity and difficulties of federated hetero-task learning. Our benchmark, including the federated dataset, protocols, the evaluation mechanism and the preliminary experiment, is open-sourced at https://github.com/alibaba/FederatedScope/tree/master/benchmark/B-FHTL

MAJul 25, 2024Code
Very Large-Scale Multi-Agent Simulation in AgentScope

Xuchen Pan, Dawei Gao, Yuexiang Xie et al.

Recent advances in large language models (LLMs) have opened new avenues for applying multi-agent systems in very large-scale simulations. However, there remain several challenges when conducting multi-agent simulations with existing platforms, such as limited scalability and low efficiency, unsatisfied agent diversity, and effort-intensive management processes. To address these challenges, we develop several new features and components for AgentScope, a user-friendly multi-agent platform, enhancing its convenience and flexibility for supporting very large-scale multi-agent simulations. Specifically, we propose an actor-based distributed mechanism as the underlying technological infrastructure towards great scalability and high efficiency, and provide flexible environment support for simulating various real-world scenarios, which enables parallel execution of multiple agents, automatic workflow conversion for distributed deployment, and both inter-agent and agent-environment interactions. Moreover, we integrate an easy-to-use configurable tool and an automatic background generation pipeline in AgentScope, simplifying the process of creating agents with diverse yet detailed background settings. Last but not least, we provide a web-based interface for conveniently monitoring and managing a large number of agents that might deploy across multiple devices. We conduct a comprehensive simulation to demonstrate the effectiveness of these proposed enhancements in AgentScope, and provide detailed observations and insightful discussions to highlight the great potential of applying multi-agent systems in large-scale simulations. The source code is released on GitHub at https://github.com/modelscope/agentscope/tree/main/examples/paper_large_scale_simulation to inspire further research and development in large-scale multi-agent simulations.

CLJul 16, 2023
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study

Peiyu Liu, Zikang Liu, Ze-Feng Gao et al.

Despite the superior performance, Large Language Models~(LLMs) require significant computational resources for deployment and use. To overcome this issue, quantization methods have been widely applied to reduce the memory footprint of LLMs as well as increasing the inference rate. However, a major challenge is that low-bit quantization methods often lead to performance degradation. It is important to understand how quantization impacts the capacity of LLMs. Different from previous studies focused on overall performance, this work aims to investigate the impact of quantization on \emph{emergent abilities}, which are important characteristics that distinguish LLMs from small language models. Specially, we examine the abilities of in-context learning, chain-of-thought reasoning, and instruction-following in quantized LLMs. Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation on the test of these abilities. To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning. Our work derives a series of important findings to understand the impact of quantization on emergent abilities, and sheds lights on the possibilities of extremely low-bit quantization for LLMs.

LGMar 23, 2023
FS-Real: Towards Real-World Cross-Device Federated Learning

Daoyuan Chen, Dawei Gao, Yuexiang Xie et al.

Federated Learning (FL) aims to train high-quality models in collaboration with distributed clients while not uploading their local data, which attracts increasing attention in both academia and industry. However, there is still a considerable gap between the flourishing FL research and real-world scenarios, mainly caused by the characteristics of heterogeneous devices and its scales. Most existing works conduct evaluations with homogeneous devices, which are mismatched with the diversity and variability of heterogeneous devices in real-world scenarios. Moreover, it is challenging to conduct research and development at scale with heterogeneous devices due to limited resources and complex software stacks. These two key factors are important yet underexplored in FL research as they directly impact the FL training dynamics and final performance, making the effectiveness and usability of FL algorithms unclear. To bridge the gap, in this paper, we propose an efficient and scalable prototyping system for real-world cross-device FL, FS-Real. It supports heterogeneous device runtime, contains parallelism and robustness enhanced FL server, and provides implementations and extensibility for advanced FL utility features such as personalization, communication compression and asynchronous aggregation. To demonstrate the usability and efficiency of FS-Real, we conduct extensive experiments with various device distributions, quantify and analyze the effect of the heterogeneous device and various scales, and further provide insights and open discussions about real-world FL scenarios. Our system is released to help to pave the way for further real-world FL research and broad applications involving diverse devices and scales.

SPOct 8, 2022
Signal Detection in MIMO Systems with Hardware Imperfections: Message Passing on Neural Networks

Dawei Gao, Qinghua Guo, Guisheng Liao et al.

In this paper, we investigate signal detection in multiple-input-multiple-output (MIMO) communication systems with hardware impairments, such as power amplifier nonlinearity and in-phase/quadrature imbalance. To deal with the complex combined effects of hardware imperfections, neural network (NN) techniques, in particular deep neural networks (DNNs), have been studied to directly compensate for the impact of hardware impairments. However, it is difficult to train a DNN with limited pilot signals, hindering its practical applications. In this work, we investigate how to achieve efficient Bayesian signal detection in MIMO systems with hardware imperfections. Characterizing combined hardware imperfections often leads to complicated signal models, making Bayesian signal detection challenging. To address this issue, we first train an NN to "model" the MIMO system with hardware imperfections and then perform Bayesian inference based on the trained NN. Modelling the MIMO system with NN enables the design of NN architectures based on the signal flow of the MIMO system, minimizing the number of NN layers and parameters, which is crucial to achieving efficient training with limited pilot signals. We then represent the trained NN with a factor graph, and design an efficient message passing based Bayesian signal detector, leveraging the unitary approximate message passing (UAMP) algorithm. The implementation of a turbo receiver with the proposed Bayesian detector is also investigated. Extensive simulation results demonstrate that the proposed technique delivers remarkably better performance than state-of-the-art methods.

SPNov 9, 2022
Hyper-Parameter Auto-Tuning for Sparse Bayesian Learning

Dawei Gao, Qinghua Guo, Ming Jin et al.

Choosing the values of hyper-parameters in sparse Bayesian learning (SBL) can significantly impact performance. However, the hyper-parameters are normally tuned manually, which is often a difficult task. Most recently, effective automatic hyper-parameter tuning was achieved by using an empirical auto-tuner. In this work, we address the issue of hyper-parameter auto-tuning using neural network (NN)-based learning. Inspired by the empirical auto-tuner, we design and learn a NN-based auto-tuner, and show that considerable improvement in convergence rate and recovery performance can be achieved.

MAFeb 21, 2024Code
AgentScope: A Flexible yet Robust Multi-Agent Platform

Dawei Gao, Zitao Li, Xuchen Pan et al.

With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications. However, the complexities in coordinating agents' cooperation and LLMs' erratic performance pose notable challenges in developing robust and efficient multi-agent applications. To tackle these challenges, we propose AgentScope, a developer-centric multi-agent platform with message exchange as its core communication mechanism. The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment. Towards robust and flexible multi-agent application, AgentScope provides both built-in and customizable fault tolerance mechanisms. At the same time, it is also armed with system-level support for managing and utilizing multi-modal data, tools, and external knowledge. Additionally, we design an actor-based distribution framework, enabling easy conversion between local and distributed deployments and automatic parallel optimization without extra effort. With these features, AgentScope empowers developers to build applications that fully realize the potential of intelligent agents. We have released AgentScope at https://github.com/modelscope/agentscope, and hope AgentScope invites wider participation and innovation in this fast-moving field.

CLJan 7, 2024Code
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning

Yingqian Min, Kun Zhou, Dawei Gao et al.

Recently, multi-task instruction tuning has been applied into sentence representation learning, which endows the capability of generating specific representations with the guidance of task instruction, exhibiting strong generalization ability on new tasks. However, these methods mostly neglect the potential interference problems across different tasks and instances, which may affect the training and convergence of the model. To address it, we propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training, to minimize the interference risks from the two views. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk, which is exactly the traveling salesman problem, hence we utilize a simulated annealing algorithm to find its solution. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training. Experiments on MTEB sentence representation evaluation tasks show that our approach can boost the performance of state-of-the-art methods. Our code and data are publicly available at the link: \url{https://github.com/RUCAIBox/Data-CUBE}.

AIFeb 13, 2025Code
KIMAs: A Configurable Knowledge Integrated Multi-Agent System

Zitao Li, Fei Wei, Yuexiang Xie et al.

Knowledge-intensive conversations supported by large language models (LLMs) have become one of the most popular and helpful applications that can assist people in different aspects. Many current knowledge-intensive applications are centered on retrieval-augmented generation (RAG) techniques. While many open-source RAG frameworks facilitate the development of RAG-based applications, they often fall short in handling practical scenarios complicated by heterogeneous data in topics and formats, conversational context management, and the requirement of low-latency response times. This technical report presents a configurable knowledge integrated multi-agent system, KIMAs, to address these challenges. KIMAs features a flexible and configurable system for integrating diverse knowledge sources with 1) context management and query rewrite mechanisms to improve retrieval accuracy and multi-turn conversational coherency, 2) efficient knowledge routing and retrieval, 3) simple but effective filter and reference generation mechanisms, and 4) optimized parallelizable multi-agent pipeline execution. Our work provides a scalable framework for advancing the deployment of LLMs in real-world settings. To show how KIMAs can help developers build knowledge-intensive applications with different scales and emphases, we demonstrate how we configure the system to three applications already running in practice with reliable performance.

CLFeb 17, 2025Code
Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models

Zikang Liu, Kun Zhou, Wayne Xin Zhao et al.

Visual instruction tuning has become the predominant technology in eliciting the multimodal task-solving capabilities of large vision-language models (LVLMs). Despite the success, as visual instructions require images as the input, it would leave the gap in inheriting the task-solving capabilities from the backbone LLMs, and make it costly to collect a large-scale dataset. To address it, we propose ViFT, a visual instruction-free fine-tuning framework for LVLMs. In ViFT, we only require the text-only instructions and image caption data during training, to separately learn the task-solving and visual perception abilities. During inference, we extract and combine the representations of the text and image inputs, for fusing the two abilities to fulfill multimodal tasks. Experimental results demonstrate that ViFT can achieve state-of-the-art performance on several visual reasoning and visual instruction following benchmarks, with rather less training data. Our code and data will be publicly released.

CLMar 14, 2024Code
Less is More: High-value Data Selection for Visual Instruction Tuning

Zikang Liu, Kun Zhou, Wayne Xin Zhao et al.

Visual instruction tuning is the key to building large vision language models~(LVLMs), which can greatly improve the task generalization and solving capabilities by learning a mixture of instruction data from diverse visual tasks. Previous work mostly collects multiple existing visual instruction datasets via heuristic ways for training (even more than a million instructions), which may introduce data redundancy and enlarge the training cost. To investigate this issue, we conduct a series of empirical studies, which reveal a significant redundancy within the visual instruction datasets, and show that greatly reducing the amount of instructions from several tasks even do not affect the performance. Based on the findings, we propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost. In TIVE, we first estimate the instance influence score on its corresponding task, and the task difficulty score, based on the gradient-based influence functions. Then, we leverage the two kinds of scores to determine the task proportion within the selected visual instruction subset, and select high-value instances for each task, respectively. Experiments on various LVLMs show that our approach using only about 15% data can achieve comparable average performance to the full-data fine-tuned model across eight benchmarks, even surpassing it on four of the benchmarks. Our code and data will be publicly released.

LGSep 1, 2023Code
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning

Weirui Kuang, Bingchen Qian, Zitao Li et al.

LLMs have demonstrated great capabilities in various NLP tasks. Different entities can further improve the performance of those LLMs on their specific downstream tasks by fine-tuning LLMs. When several entities have similar interested tasks, but their data cannot be shared because of privacy concerns regulations, federated learning (FL) is a mainstream solution to leverage the data of different entities. However, fine-tuning LLMs in federated learning settings still lacks adequate support from existing FL frameworks because it has to deal with optimizing the consumption of significant communication and computational resources, data preparation for different tasks, and distinct information protection demands. This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution, which consists of the following components: (1) we build an end-to-end benchmarking pipeline, automizing the processes of dataset preprocessing, federated fine-tuning execution, and performance evaluation on federated LLM fine-tuning; (2) we provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios with low communication and computation costs, even without accessing the full model; (3) we adopt several accelerating and resource-efficient operators for fine-tuning LLMs with limited resources and the flexible pluggable sub-routines for interdisciplinary study. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings, which also yields valuable insights into federated fine-tuning LLMs for the research community. To facilitate further research and adoption, we release FS-LLM at https://github.com/alibaba/FederatedScope/tree/llm.

LGMay 23, 2025
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

Xuchen Pan, Yanxi Chen, Yushuo Chen et al.

Trinity-RFT is a general-purpose, unified and easy-to-use framework designed for reinforcement fine-tuning (RFT) of large language models. It is built with a modular and decoupled design, consisting of (1) an RFT-core that unifies and generalizes synchronous/asynchronous, on-policy/off-policy, and online/offline modes of RFT; (2) seamless integration for agent-environment interaction with high efficiency and robustness; and (3) systematic data pipelines optimized for RFT. Trinity-RFT can be easily adapted for diverse application scenarios, and serves as a unified platform for development and research of advanced reinforcement learning paradigms at both macroscopic and microscopic levels. This technical report outlines the vision, features, design and implementations of Trinity-RFT, accompanied by extensive examples, applications and experiments that demonstrate its functionalities and user-friendliness.

CVFeb 15, 2025
SEM-CLIP: Precise Few-Shot Learning for Nanoscale Defect Detection in Scanning Electron Microscope Image

Qian Jin, Yuqi Jiang, Xudong Lu et al.

In the field of integrated circuit manufacturing, the detection and classification of nanoscale wafer defects are critical for subsequent root cause analysis and yield enhancement. The complex background patterns observed in scanning electron microscope (SEM) images and the diverse textures of the defects pose significant challenges. Traditional methods usually suffer from insufficient data, labels, and poor transferability. In this paper, we propose a novel few-shot learning approach, SEM-CLIP, for accurate defect classification and segmentation. SEM-CLIP customizes the Contrastive Language-Image Pretraining (CLIP) model to better focus on defect areas and minimize background distractions, thereby enhancing segmentation accuracy. We employ text prompts enriched with domain knowledge as prior information to assist in precise analysis. Additionally, our approach incorporates feature engineering with textual guidance to categorize defects more effectively. SEM-CLIP requires little annotated data, substantially reducing labor demands in the semiconductor industry. Extensive experimental validation demonstrates that our model achieves impressive classification and segmentation results under few-shot learning scenarios.

AIAug 22, 2025
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Dawei Gao, Zitao Li, Yuexiang Xie et al.

Driven by rapid advancements of Large Language Models (LLMs), agents are empowered to combine intrinsic knowledge with dynamic tool use, greatly enhancing their capacity to address real-world tasks. In line with such an evolution, AgentScope introduces major improvements in a new version (1.0), towards comprehensively supporting flexible and efficient tool-based agent-environment interactions for building agentic applications. Specifically, we abstract foundational components essential for agentic applications and provide unified interfaces and extensible modules, enabling developers to easily leverage the latest progress, such as new models and MCPs. Furthermore, we ground agent behaviors in the ReAct paradigm and offer advanced agent-level infrastructure based on a systematic asynchronous design, which enriches both human-agent and agent-agent interaction patterns while improving execution efficiency. Building on this foundation, we integrate several built-in agents tailored to specific practical scenarios. AgentScope also includes robust engineering support for developer-friendly experiences. We provide a scalable evaluation module with a visual studio interface, making the development of long-trajectory agentic applications more manageable and easier to trace. In addition, AgentScope offers a runtime sandbox to ensure safe agent execution and facilitates rapid deployment in production environments. With these enhancements, AgentScope provides a practical foundation for building scalable, adaptive, and effective agentic applications.

CVOct 1, 2025
PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents

Zikang Liu, Junyi Li, Wayne Xin Zhao et al.

Graphical User Interface (GUI) agents powered by Multimodal Large Language Models (MLLMs) promise human-like interaction with software applications, yet long-horizon tasks remain challenging due to memory limitations. Existing approaches either truncate history or rely on simple textual summaries, which risk losing critical information when past visual details become necessary for future decisions. In this paper, we propose \textbf{PAL-UI} (\textbf{P}lanning with \textbf{A}ctive \textbf{L}ook-back), a novel framework that enables GUI agents to adaptively retrieve past observations when required. PAL-UI combines a dual-level summarization agent, capturing both observation-level cues and action-level outcomes, with a dedicated retrieval tool that allows the agent to recall specific historical screenshots during planning. We curate a step-level instruction dataset of 8.6K samples from mobile GUI navigation trajectories and train \textbf{PAL-UI-3B} and \textbf{PAL-UI-7B} models based on Qwen2.5-VL. Extensive experiments demonstrate that PAL-UI significantly outperforms baseline models and prior methods in mobile GUI navigation tasks, even under data-efficient settings. Moreover, PAL-UI exhibits strong cross-domain generalization, achieving notable improvements in web navigation without additional training. Our work highlights the potential of active memory retrieval for long-horizon planning capabilities of vision-based GUI agents.

LGSep 27, 2025
PHASE: Physics-Integrated, Heterogeneity-Aware Surrogates for Scientific Simulations

Dawei Gao, Dali Wang, Zhuowei Gu et al.

Large-scale numerical simulations underpin modern scientific discovery but remain constrained by prohibitive computational costs. AI surrogates offer acceleration, yet adoption in mission-critical settings is limited by concerns over physical plausibility, trustworthiness, and the fusion of heterogeneous data. We introduce PHASE, a modular deep-learning framework for physics-integrated, heterogeneity-aware surrogates in scientific simulations. PHASE combines data-type-aware encoders for heterogeneous inputs with multi-level physics-based constraints that promote consistency from local dynamics to global system behavior. We validate PHASE on the biogeochemical (BGC) spin-up workflow of the U.S. Department of Energy's Energy Exascale Earth System Model (E3SM) Land Model (ELM), presenting-to our knowledge-the first scientifically validated AI-accelerated solution for this task. Using only the first 20 simulation years, PHASE infers a near-equilibrium state that otherwise requires more than 1,200 years of integration, yielding an effective reduction in required integration length by at least 60x. The framework is enabled by a pipeline for fusing heterogeneous scientific data and demonstrates strong generalization to higher spatial resolutions with minimal fine-tuning. These results indicate that PHASE captures governing physical regularities rather than surface correlations, enabling practical, physically consistent acceleration of land-surface modeling and other complex scientific workflows.

LGMay 4, 2023
Efficient Personalized Federated Learning via Sparse Model-Adaptation

Daoyuan Chen, Liuyi Yao, Dawei Gao et al.

Federated Learning (FL) aims to train machine learning models for multiple clients without sharing their own private data. Due to the heterogeneity of clients' local data distribution, recent studies explore the personalized FL that learns and deploys distinct local models with the help of auxiliary global models. However, the clients can be heterogeneous in terms of not only local data distribution, but also their computation and communication resources. The capacity and efficiency of personalized models are restricted by the lowest-resource clients, leading to sub-optimal performance and limited practicality of personalized FL. To overcome these challenges, we propose a novel approach named pFedGate for efficient personalized FL by adaptively and efficiently learning sparse local models. With a lightweight trainable gating layer, pFedGate enables clients to reach their full potential in model capacity by generating different sparse models accounting for both the heterogeneous data distributions and resource constraints. Meanwhile, the computation and communication efficiency are both improved thanks to the adaptability between the model sparsity and clients' resources. Further, we theoretically show that the proposed pFedGate has superior complexity with guaranteed convergence and generalization error. Extensive experiments show that pFedGate achieves superior global accuracy, individual accuracy and efficiency simultaneously over state-of-the-art methods. We also demonstrate that pFedGate performs better than competitors in the novel clients participation and partial clients participation scenarios, and can learn meaningful sparse local models adapted to different data distributions.

SPJul 1, 2020
Massive MIMO As an Extreme Learning Machine

Dawei Gao, Qinghua Guo, Yonina C. Eldar

This work shows that a massive multiple-input multiple-output (MIMO) system with low-resolution analog-to-digital converters (ADCs) forms a natural extreme learning machine (ELM). The receive antennas at the base station serve as the hidden nodes of the ELM, and the low-resolution ADCs act as the ELM activation function. By adding random biases to the received signals and optimizing the ELM output weights, the system can effectively tackle hardware impairments, such as the nonlinearity of power amplifiers and the low-resolution ADCs. Moreover, the fast adaptive capability of ELM allows the design of an adaptive receiver to address time-varying effects of MIMO channels. Simulations demonstrate the promising performance of the ELM-based receiver compared to conventional receivers in dealing with hardware impairments.

LGMay 23, 2019
Pruning-Aware Merging for Efficient Multitask Inference

Xiaoxi He, Dawei Gao, Zimu Zhou et al.

Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation cost. Pruning each network separately yields suboptimal computation cost due to task relatedness. A promising remedy is to merge the networks into a multitask network to eliminate redundancy across tasks before network pruning. However, pruning a multitask network combined by existing network merging schemes cannot minimise the computation cost of every task combination because they do not consider such a future pruning. To this end, we theoretically identify the conditions such that pruning a multitask network minimises the computation of all task combinations. On this basis, we propose Pruning-Aware Merging (PAM), a heuristic network merging scheme to construct a multitask network that approximates these conditions. The merged network is then ready to be further pruned by existing network pruning methods. Evaluations with different pruning schemes, datasets, and network architectures show that PAM achieves up to 4.87x less computation against the baseline without network merging, and up to 2.01x less computation against the baseline with a state-of-the-art network merging scheme.

SPFeb 27, 2019
Extreme Learning Machine-Based Receiver for MIMO LED Communications

Dawei Gao, Qinghua Guo

This work concerns receiver design for light-emitting diode (LED) multiple input multiple output (MIMO) communications where the LED nonlinearity can severely degrade the performance of communications. In this paper, we propose an extreme learning machine (ELM) based receiver to jointly handle the LED nonlinearity and cross-LED interference, and a circulant input weight matrix is employed, which significantly reduces the complexity of the receiver with the fast Fourier transform (FFT). It is demonstrated that the proposed receiver can efficiently handle the LED nonlinearity and cross-LED interference.