Guodong Long

LG
h-index51
113papers
32,657citations
Novelty51%
AI Score61

113 Papers

CLSep 12, 2023Code
Re-Reading Improves Reasoning in Large Language Models

Xiaohan Xu, Chongyang Tao, Tao Shen et al. · microsoft-research

To enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs), we introduce a simple, yet general and effective prompting method, Re2, i.e., \textbf{Re}-\textbf{Re}ading the question as input. Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), which aim to elicit the reasoning process in the output, Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. Consequently, Re2 demonstrates strong generality and compatibility with most thought-eliciting prompting methods, including CoT. Crucially, Re2 facilitates a "bidirectional" encoding in unidirectional decoder-only LLMs because the first pass could provide global information for the second pass. We begin with a preliminary empirical study as the foundation of Re2, illustrating its potential to enable "bidirectional" attention mechanisms. We then evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality. Our findings indicate that, with the exception of a few scenarios on vanilla ChatGPT, Re2 consistently enhances the reasoning performance of LLMs through a simple re-reading strategy. Further analyses reveal Re2's adaptability, showing how it can be effectively integrated with different LLMs, thought-eliciting prompting, and ensemble strategies. Our code is available at \url{https://github.com/Tebmer/Rereading-LLM-Reasoning/}

IRJan 16, 2023Code
Dual Personalization on Federated Recommendation

Chunxu Zhang, Guodong Long, Tianyi Zhou et al.

Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available.

IRJun 16, 2022
Towards Robust Ranker for Text Retrieval

Yucheng Zhou, Tao Shen, Xiubo Geng et al. · microsoft-research

A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranker. Thereby, we propose multiple retrievers as negative generators improve the ranker's robustness, where i) involving extensive out-of-distribution label noises renders the ranker against each noise distribution, and ii) diverse hard negatives from a joint distribution are relatively close to the ranker's negative distribution, leading to more challenging thus effective training. To evaluate our robust ranker (dubbed R$^2$anker), we conduct experiments in various settings on the popular passage retrieval benchmark, including BM25-reranking, full-ranking, retriever distillation, etc. The empirical results verify the new state-of-the-art effectiveness of our model.

IRDec 20, 2022
Fine-Grained Distillation for Long Document Retrieval

Yucheng Zhou, Tao Shen, Xiubo Geng et al. · microsoft-research

Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in contrast to passages or sentences, retrieval on long documents suffers from the scope hypothesis that a long document may cover multiple topics. This maximizes their structure heterogeneity and poses a granular-mismatch issue, leading to an inferior distillation efficacy. In this work, we propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers. While preserving the conventional dense retrieval paradigm, it first produces global-consistent representations crossing different fine granularity and then applies multi-granular aligned distillation merely during training. In experiments, we evaluate our framework on two long-document retrieval benchmarks, which show state-of-the-art performance.

CRSep 21, 2022
Federated Learning from Pre-Trained Models: A Contrastive Learning Approach

Yue Tan, Guodong Long, Jie Ma et al. · amazon-science

Federated Learning (FL) is a machine learning paradigm that allows decentralized clients to learn collaboratively without sharing their private data. However, excessive computation and communication demands pose challenges to current FL frameworks, especially when training large-scale models. To prevent these issues from hindering the deployment of FL systems, we propose a lightweight framework where clients jointly learn to fuse the representations generated by multiple fixed pre-trained models rather than training a large-scale model from scratch. This leads us to a more practical FL problem by considering how to capture more client-specific and class-relevant information from the pre-trained models and jointly improve each client's ability to exploit those off-the-shelf models. In this work, we design a Federated Prototype-wise Contrastive Learning (FedPCL) approach which shares knowledge across clients through their class prototypes and builds client-specific representations in a prototype-wise contrastive manner. Sharing prototypes rather than learnable model parameters allows each client to fuse the representations in a personalized way while keeping the shared knowledge in a compact form for efficient communication. We perform a thorough evaluation of the proposed FedPCL in the lightweight framework, measuring and visualizing its ability to fuse various pre-trained models on popular FL datasets.

IRMay 23, 2022
UnifieR: A Unified Retriever for Large-Scale Retrieval

Tao Shen, Xiubo Geng, Chongyang Tao et al. · microsoft-research

Large-scale retrieval is to recall relevant documents from a huge collection given a query. It relies on representation learning to embed documents and queries into a common semantic encoding space. According to the encoding space, recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. These two paradigms unveil the PLMs' representation capability in different granularities, i.e., global sequence-level compression and local word-level contexts, respectively. Inspired by their complementary global-local contextualization and distinct representing views, we propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability. Experiments on passage retrieval benchmarks verify its effectiveness in both paradigms. A uni-retrieval scheme is further presented with even better retrieval quality. We lastly evaluate the model on BEIR benchmark to verify its transferability.

IRAug 16, 2024Code
Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach

Zhiwei Li, Guodong Long, Tianyi Zhou et al.

Federated Collaborative Filtering (FedCF) is an emerging field focused on developing a new recommendation framework with preserving privacy in a federated setting. Existing FedCF methods typically combine distributed Collaborative Filtering (CF) algorithms with privacy-preserving mechanisms, and then preserve personalized information into a user embedding vector. However, the user embedding is usually insufficient to preserve the rich information of the fine-grained personalization across heterogeneous clients. This paper proposes a novel personalized FedCF method by preserving users' personalized information into a latent variable and a neural model simultaneously. Specifically, we decompose the modeling of user knowledge into two encoders, each designed to capture shared knowledge and personalized knowledge separately. A personalized gating network is then applied to balance personalization and generalization between the global and local encoders. Moreover, to effectively train the proposed framework, we model the CF problem as a specialized Variational AutoEncoder (VAE) task by integrating user interaction vector reconstruction with missing value prediction. The decoder is trained to reconstruct the implicit feedback from items the user has interacted with, while also predicting items the user might be interested in but has not yet interacted with. Experimental results on benchmark datasets demonstrate that the proposed method outperforms other baseline methods, showcasing superior performance. Our code is available at https://github.com/mtics/FedDAE.

LGJul 4, 2023
Causal Reinforcement Learning: A Survey

Zhihong Deng, Jing Jiang, Guodong Long et al. · pku

Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. Despite many remarkable achievements in recent decades, applying reinforcement learning methods in the real world remains challenging. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world and must therefore learn from scratch through numerous trial-and-error interactions. They may also face challenges in providing explanations for their decisions and generalizing the acquired knowledge. Causality, however, offers a notable advantage as it can formalize knowledge in a systematic manner and leverage invariance for effective knowledge transfer. This has led to the emergence of causal reinforcement learning, a subfield of reinforcement learning that seeks to enhance existing algorithms by incorporating causal relationships into the learning process. In this survey, we comprehensively review the literature on causal reinforcement learning. We first introduce the basic concepts of causality and reinforcement learning, and then explain how causality can address core challenges in non-causal reinforcement learning. We categorize and systematically review existing causal reinforcement learning approaches based on their target problems and methodologies. Finally, we outline open issues and future directions in this emerging field.

LGApr 9, 2023
Does Continual Learning Equally Forget All Parameters?

Haiyan Zhao, Tianyi Zhou, Guodong Long et al. · uw

Distribution shift (e.g., task or domain shift) in continual learning (CL) usually results in catastrophic forgetting of neural networks. Although it can be alleviated by repeatedly replaying buffered data, the every-step replay is time-consuming. In this paper, we study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL. Our proposed metrics show that only a few modules are more task-specific and sensitively alter between tasks, while others can be shared across tasks as common knowledge. Hence, we attribute forgetting mainly to the former and find that finetuning them only on a small buffer at the end of any CL method can bring non-trivial improvement. Due to the small number of finetuned parameters, such ``Forgetting Prioritized Finetuning (FPF)'' is efficient in computation. We further propose a more efficient and simpler method that entirely removes the every-step replay and replaces them by only $k$-times of FPF periodically triggered during CL. Surprisingly, this ``$k$-FPF'' performs comparably to FPF and outperforms the SOTA CL methods but significantly reduces their computational overhead and cost. In experiments on several benchmarks of class- and domain-incremental CL, FPF consistently improves existing CL methods by a large margin, and $k$-FPF further excels in efficiency without degrading the accuracy. We also empirically studied the impact of buffer size, epochs per task, and finetuning modules on the cost and accuracy of our methods.

LGMay 20, 2022
FedNoiL: A Simple Two-Level Sampling Method for Federated Learning with Noisy Labels

Zhuowei Wang, Tianyi Zhou, Guodong Long et al. · uw

Federated learning (FL) aims at training a global model on the server side while the training data are collected and located at the local devices. Hence, the labels in practice are usually annotated by clients of varying expertise or criteria and thus contain different amounts of noises. Local training on noisy labels can easily result in overfitting to noisy labels, which is devastating to the global model through aggregation. Although recent robust FL methods take malicious clients into account, they have not addressed local noisy labels on each device and the impact to the global model. In this paper, we develop a simple two-level sampling method "FedNoiL" that (1) selects clients for more robust global aggregation on the server; and (2) selects clean labels and correct pseudo-labels at the client end for more robust local training. The sampling probabilities are built upon clean label detection by the global model. Moreover, we investigate different schedules changing the local epochs between aggregations over the course of FL, which notably improves the communication and computation efficiency in noisy label setting. In experiments with homogeneous/heterogeneous data distributions and noise ratios, we observed that direct combinations of SOTA FL methods with SOTA noisy-label learning methods can easily fail but our method consistently achieves better and robust performance.

LGNov 23, 2022
Federated Learning on Non-IID Graphs via Structural Knowledge Sharing

Yue Tan, Yixin Liu, Guodong Long et al.

Graph neural networks (GNNs) have shown their superiority in modeling graph data. Owing to the advantages of federated learning, federated graph learning (FGL) enables clients to train strong GNN models in a distributed manner without sharing their private data. A core challenge in federated systems is the non-IID problem, which also widely exists in real-world graph data. For example, local data of clients may come from diverse datasets or even domains, e.g., social networks and molecules, increasing the difficulty for FGL methods to capture commonly shared knowledge and learn a generalized encoder. From real-world graph datasets, we observe that some structural properties are shared by various domains, presenting great potential for sharing structural knowledge in FGL. Inspired by this, we propose FedStar, an FGL framework that extracts and shares the common underlying structure information for inter-graph federated learning tasks. To explicitly extract the structure information rather than encoding them along with the node features, we define structure embeddings and encode them with an independent structure encoder. Then, the structure encoder is shared across clients while the feature-based knowledge is learned in a personalized way, making FedStar capable of capturing more structure-based domain-invariant information and avoiding feature misalignment issues. We perform extensive experiments over both cross-dataset and cross-domain non-IID FGL settings, demonstrating the superiority of FedStar.

LGFeb 4, 2023
A Survey on Deep Learning based Time Series Analysis with Frequency Transformation

Kun Yi, Qi Zhang, Wei Fan et al.

Recently, frequency transformation (FT) has been increasingly incorporated into deep learning models to significantly enhance state-of-the-art accuracy and efficiency in time series analysis. The advantages of FT, such as high efficiency and a global view, have been rapidly explored and exploited in various time series tasks and applications, demonstrating the promising potential of FT as a new deep learning paradigm for time series analysis. Despite the growing attention and the proliferation of research in this emerging field, there is currently a lack of a systematic review and in-depth analysis of deep learning-based time series models with FT. It is also unclear why FT can enhance time series analysis and what its limitations are in the field. To address these gaps, we present a comprehensive review that systematically investigates and summarizes the recent research advancements in deep learning-based time series analysis with FT. Specifically, we explore the primary approaches used in current models that incorporate FT, the types of neural networks that leverage FT, and the representative FT-equipped models in deep time series analysis. We propose a novel taxonomy to categorize the existing methods in this field, providing a structured overview of the diverse approaches employed in incorporating FT into deep learning models for time series analysis. Finally, we highlight the advantages and limitations of FT for time series modeling and identify potential future research directions that can further contribute to the community of time series analysis.

CLMar 4, 2022
ClarET: Pre-training a Correlation-Aware Context-To-Event Transformer for Event-Centric Generation and Classification

Yucheng Zhou, Tao Shen, Xiubo Geng et al.

Generating new events given context with correlated ones plays a crucial role in many event-centric reasoning tasks. Existing works either limit their scope to specific scenarios or overlook event-level correlations. In this paper, we propose to pre-train a general Correlation-aware context-to-Event Transformer (ClarET) for event-centric reasoning. To achieve this, we propose three novel event-centric objectives, i.e., whole event recovering, contrastive event-correlation encoding and prompt-based event locating, which highlight event-level correlations with effective training. The proposed ClarET is applicable to a wide range of event-centric reasoning scenarios, considering its versatility of (i) event-correlation types (e.g., causal, temporal, contrast), (ii) application formulations (i.e., generation and classification), and (iii) reasoning types (e.g., abductive, counterfactual and ending reasoning). Empirical fine-tuning results, as well as zero- and few-shot learning, on 9 benchmarks (5 generation and 4 classification tasks covering 4 reasoning types with diverse event correlations), verify its effectiveness and generalization ability.

LGMar 2, 2022
Personalized Federated Learning With Graph

Fengwen Chen, Guodong Long, Zonghan Wu et al.

Knowledge sharing and model personalization are two key components in the conceptual framework of personalized federated learning (PFL). Existing PFL methods focus on proposing new model personalization mechanisms while simply implementing knowledge sharing by aggregating models from all clients, regardless of their relation graph. This paper aims to enhance the knowledge-sharing process in PFL by leveraging the graph-based structural information among clients. We propose a novel structured federated learning (SFL) framework to learn both the global and personalized models simultaneously using client-wise relation graphs and clients' private data. We cast SFL with graph into a novel optimization problem that can model the client-wise complex relations and graph-based structural topology by a unified framework. Moreover, in addition to using an existing relation graph, SFL could be expanded to learn the hidden relations among clients. Experiments on traffic and image benchmark datasets can demonstrate the effectiveness of the proposed method. All implementation codes are available on Github

LGJan 22, 2023
Prompt Federated Learning for Weather Forecasting: Toward Foundation Models on Meteorological Data

Shengchao Chen, Guodong Long, Tao Shen et al.

To tackle the global climate challenge, it urgently needs to develop a collaborative platform for comprehensive weather forecasting on large-scale meteorological data. Despite urgency, heterogeneous meteorological sensors across countries and regions, inevitably causing multivariate heterogeneity and data exposure, become the main barrier. This paper develops a foundation model across regions capable of understanding complex meteorological data and providing weather forecasting. To relieve the data exposure concern across regions, a novel federated learning approach has been proposed to collaboratively learn a brand-new spatio-temporal Transformer-based foundation model across participants with heterogeneous meteorological data. Moreover, a novel prompt learning mechanism has been adopted to satisfy low-resourced sensors' communication and computational constraints. The effectiveness of the proposed method has been demonstrated on classical weather forecasting tasks using three meteorological datasets with multivariate time series.

CLApr 27, 2023
Large Language Models are Strong Zero-Shot Retriever

Tao Shen, Guodong Long, Xiubo Geng et al.

In this work, we propose a simple method that applies a large language model (LLM) to large-scale retrieval in zero-shot scenarios. Our method, the Language language model as Retriever (LameR), is built upon no other neural models but an LLM, while breaking brute-force combinations of retrievers with LLMs and lifting the performance of zero-shot retrieval to be very competitive on benchmark datasets. Essentially, we propose to augment a query with its potential answers by prompting LLMs with a composition of the query and the query's in-domain candidates. The candidates, regardless of correct or wrong, are obtained by a vanilla retrieval procedure on the target collection. As a part of the prompts, they are likely to help LLM generate more precise answers by pattern imitation or candidate summarization. Even if all the candidates are wrong, the prompts at least make LLM aware of in-collection patterns and genres. Moreover, due to the low performance of a self-supervised retriever, the LLM-based query augmentation becomes less effective as the retriever bottlenecks the whole pipeline. Therefore, we propose to leverage a non-parametric lexicon-based method (e.g., BM25) as the retrieval module to capture query-document overlap in a literal fashion. As such, LameR makes the retrieval procedure transparent to the LLM, thus circumventing the performance bottleneck.

LGJan 22, 2023
Federated Recommendation with Additive Personalization

Zhiwei Li, Guodong Long, Tianyi Zhou

Building recommendation systems via federated learning (FL) is a new emerging challenge for advancing next-generation Internet service and privacy protection. Existing approaches train shared item embedding by FL while keeping the user embedding private on client side. However, item embedding identical for all clients cannot capture users' individual differences on perceiving the same item and thus leads to poor personalization. Moreover, dense item embedding in FL results in expensive communication cost and latency. To address these challenges, we propose Federated Recommendation with Additive Personalization (FedRAP), which learns a global view of items via FL and a personalized view locally on each user. FedRAP enforces sparsity of the global view to save FL's communication cost and encourages difference between the two views through regularization. We propose an effective curriculum to learn the local and global views progressively with increasing regularization weights. To produce recommendations for an user, FedRAP adds the two views together to obtain a personalized item embedding. FedRAP achieves the best performance in FL setting on multiple benchmarks. It outperforms recent federated recommendation methods and several ablation study baselines.

CVJan 26, 2023
Improving Cross-modal Alignment for Text-Guided Image Inpainting

Yucheng Zhou, Guodong Long

Text-guided image inpainting (TGII) aims to restore missing regions based on a given text in a damaged image. Existing methods are based on a strong vision encoder and a cross-modal fusion model to integrate cross-modal features. However, these methods allocate most of the computation to visual encoding, while light computation on modeling modality interactions. Moreover, they take cross-modal fusion for depth features, which ignores a fine-grained alignment between text and image. Recently, vision-language pre-trained models (VLPM), encapsulating rich cross-modal alignment knowledge, have advanced in most multimodal tasks. In this work, we propose a novel model for TGII by improving cross-modal alignment (CMA). CMA model consists of a VLPM as a vision-language encoder, an image generator and global-local discriminators. To explore cross-modal alignment knowledge for image restoration, we introduce cross-modal alignment distillation and in-sample distribution distillation. In addition, we employ adversarial training to enhance the model to fill the missing region in complicated structures effectively. Experiments are conducted on two popular vision-language datasets. Results show that our model achieves state-of-the-art performance compared with other strong competitors.

CVJan 26, 2023
Style-Aware Contrastive Learning for Multi-Style Image Captioning

Yucheng Zhou, Guodong Long

Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.

CLNov 15, 2023
Thread of Thought Unraveling Chaotic Contexts

Yucheng Zhou, Xiubo Geng, Tao Shen et al.

Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In response to these challenges, we introduce the "Thread of Thought" (ThoT) strategy, which draws inspiration from human cognitive processes. ThoT systematically segments and analyzes extended contexts while adeptly selecting pertinent information. This strategy serves as a versatile "plug-and-play" module, seamlessly integrating with various LLMs and prompting techniques. In the experiments, we utilize the PopQA and EntityQ datasets, as well as a Multi-Turn Conversation Response dataset (MTCR) we collected, to illustrate that ThoT significantly improves reasoning performance compared to other prompting techniques.

CVJan 26, 2023
Multimodal Event Transformer for Image-guided Story Ending Generation

Yucheng Zhou, Guodong Long

Image-guided story ending generation (IgSEG) is to generate a story ending based on given story plots and ending image. Existing methods focus on cross-modal feature fusion but overlook reasoning and mining implicit information from story plots and ending image. To tackle this drawback, we propose a multimodal event transformer, an event-based reasoning framework for IgSEG. Specifically, we construct visual and semantic event graphs from story plots and ending image, and leverage event-based reasoning to reason and mine implicit information in a single modality. Next, we connect visual and semantic event graphs and utilize cross-modal fusion to integrate different-modality features. In addition, we propose a multimodal injector to adaptive pass essential information to decoder. Besides, we present an incoherence detection to enhance the understanding context of a story plot and the robustness of graph modeling for our model. Experimental results show that our method achieves state-of-the-art performance for the image-guided story ending generation.

CLJun 1, 2023
Improving the Robustness of Summarization Systems with Dual Augmentation

Xiuying Chen, Guodong Long, Chongyang Tao et al.

A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input. In this work, we first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise. To create semantic-consistent substitutes, we propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models. Experimental results show that state-of-the-art summarization models have a significant decrease in performance on adversarial and noisy test sets. Next, we analyze the vulnerability of the summarization systems and explore improving the robustness by data augmentation. Specifically, the first brittleness factor we found is the poor understanding of infrequent words in the input. Correspondingly, we feed the encoder with more diverse cases created by SummAttacker in the input space. The other factor is in the latent space, where the attacked inputs bring more variations to the hidden states. Hence, we construct adversarial decoder input and devise manifold softmixing operation in hidden space to introduce more diversity. Experimental results on Gigaword and CNN/DM datasets demonstrate that our approach achieves significant improvements over strong baselines and exhibits higher robustness on noisy, attacked, and clean datasets.

AISep 21, 2023
Curriculum Reinforcement Learning via Morphology-Environment Co-Evolution

Shuang Ao, Tianyi Zhou, Guodong Long et al.

Throughout long history, natural species have learned to survive by evolving their physical structures adaptive to the environment changes. In contrast, current reinforcement learning (RL) studies mainly focus on training an agent with a fixed morphology (e.g., skeletal structure and joint attributes) in a fixed environment, which can hardly generalize to changing environments or new tasks. In this paper, we optimize an RL agent and its morphology through ``morphology-environment co-evolution (MECE)'', in which the morphology keeps being updated to adapt to the changing environment, while the environment is modified progressively to bring new challenges and stimulate the improvement of the morphology. This leads to a curriculum to train generalizable RL, whose morphology and policy are optimized for different environments. Instead of hand-crafting the curriculum, we train two policies to automatically change the morphology and the environment. To this end, (1) we develop two novel and effective rewards for the two policies, which are solely based on the learning dynamics of the RL agent; (2) we design a scheduler to automatically determine when to change the environment and the morphology. In experiments on two classes of tasks, the morphology and RL policies trained via MECE exhibit significantly better generalization performance in unseen test environments than SOTA morphology optimization methods. Our ablation studies on the two MECE policies further show that the co-evolution between the morphology and environment is the key to the success.

LGJun 6, 2023
Personalization Disentanglement for Federated Learning: An explainable perspective

Peng Yan, Guodong Long

Personalized federated learning (PFL) jointly trains a variety of local models through balancing between knowledge sharing across clients and model personalization per client. This paper addresses PFL via explicit disentangling latent representations into two parts to capture the shared knowledge and client-specific personalization, which leads to more reliable and effective PFL. The disentanglement is achieved by a novel Federated Dual Variational Autoencoder (FedDVA), which employs two encoders to infer the two types of representations. FedDVA can produce a better understanding of the trade-off between global knowledge sharing and local personalization in PFL. Moreover, it can be integrated with existing FL methods and turn them into personalized models for heterogeneous downstream tasks. Extensive experiments validate the advantages caused by disentanglement and show that models trained with disentangled representations substantially outperform those vanilla methods.

CLNov 11, 2022
CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class Classification

Yang Li, Canran Xu, Guodong Long et al.

Recently, prefix-tuning was proposed to efficiently adapt pre-trained language models to a broad spectrum of natural language classification tasks. It leverages soft prefix as task-specific indicators and language verbalizers as categorical-label mentions to narrow the formulation gap from pre-training language models. However, when the label space increases considerably (i.e., many-class classification), such a tuning technique suffers from a verbalizer ambiguity problem since the many-class labels are represented by semantic-similar verbalizers in short language phrases. To overcome this, inspired by the human-decision process that the most ambiguous classes would be mulled over for each instance, we propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix), for many-class classification. Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification. We conduct experiments on many-class benchmark datasets in both the fully supervised setting and the few-shot setting, which indicates that our model outperforms former baselines.

CVJul 10, 2023
One-Shot Pruning for Fast-adapting Pre-trained Models on Devices

Haiyan Zhao, Guodong Long

Large-scale pre-trained models have been remarkably successful in resolving downstream tasks. Nonetheless, deploying these models on low-capability devices still requires an effective approach, such as model pruning. However, pruning the model from scratch can pose a practical challenge given the limited resources of each downstream task or device. To tackle this issue, we present a scalable one-shot pruning method that leverages pruned knowledge of similar tasks to extract a sub-network from the pre-trained model for a new task. Specifically, we create a score mask using the pruned models of similar tasks to identify task-specific filters/nodes in the pre-trained model for the new task. Based on this mask, we conduct a single round of pruning to extract a suitably-sized sub-network that can quickly adapt to the new task with only a few training iterations. Our experimental analysis demonstrates the effectiveness of the proposed method on the convolutional neural networks (CNNs) and vision transformers (ViT) with various datasets. The proposed method consistently outperforms popular pruning baseline methods in terms of accuracy and efficiency when dealing with diverse downstream tasks with different memory constraints.

CLOct 27, 2022
Unsupervised Knowledge Graph Construction and Event-centric Knowledge Infusion for Scientific NLI

Chenglin Wang, Yucheng Zhou, Guodong Long et al.

With the advance of natural language inference (NLI), a rising demand for NLI is to handle scientific texts. Existing methods depend on pre-trained models (PTM) which lack domain-specific knowledge. To tackle this drawback, we introduce a scientific knowledge graph to generalize PTM to scientific domain. However, existing knowledge graph construction approaches suffer from some drawbacks, i.e., expensive labeled data, failure to apply in other domains, long inference time and difficulty extending to large corpora. Therefore, we propose an unsupervised knowledge graph construction method to build a scientific knowledge graph (SKG) without any labeled data. Moreover, to alleviate noise effect from SKG and complement knowledge in sentences better, we propose an event-centric knowledge infusion method to integrate external knowledge into each event that is a fine-grained semantic unit in sentences. Experimental results show that our method achieves state-of-the-art performance and the effectiveness and reliability of SKG.

53.3IRMar 18Code
Learning Evolving Preferences: A Federated Continual Framework for User-Centric Recommendation

Chunxu Zhang, Zhiheng Xue, Guodong Long et al.

User-centric recommendation has become essential for delivering personalized services, as it enables systems to adapt to users' evolving behaviors while respecting their long-term preferences and privacy constraints. Although federated learning offers a promising alternative to centralized training, existing approaches largely overlook user behavior dynamics, leading to temporal forgetting and weakened collaborative personalization. In this work, we propose FCUCR, a federated continual recommendation framework designed to support long-term personalization in a privacy-preserving manner. To address temporal forgetting, we introduce a time-aware self-distillation strategy that implicitly retains historical preferences during local model updates. To tackle collaborative personalization under heterogeneous user data, we design an inter-user prototype transfer mechanism that enriches each client's representation using knowledge from similar users while preserving individual decision logic. Extensive experiments on four public benchmarks demonstrate the superior effectiveness of our approach, along with strong compatibility and practical applicability. Code is available.

LGJan 27, 2023
Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks

Haiyan Zhao, Tianyi Zhou, Guodong Long et al.

As a few large-scale pre-trained models become the major choices of various applications, new challenges arise for model pruning, e.g., can we avoid pruning the same model from scratch for every downstream task? How to reuse the pruning results of previous tasks to accelerate the pruning for a new task? To address these challenges, we create a small model for a new task from the pruned models of similar tasks. We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task. We study this ''meta-pruning'' from nearest tasks on two major classes of pre-trained models, convolutional neural network (CNN) and vision transformer (ViT), under a limited budget of pruning iterations. Our study begins by investigating the overlap of pruned models for similar tasks and how the overlap changes over different layers and blocks. Inspired by these discoveries, we develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task by initializing a sub-network from the pruned models of its nearest tasks. In experiments, we demonstrate MVP's advantages in accuracy, efficiency, and generalization through extensive empirical studies and comparisons with popular pruning methods over several datasets.

71.9LGApr 8
Bi-level Heterogeneous Learning for Time Series Foundation Models: A Federated Learning Approach

Shengchao Chen, Guodong Long, Dikai Liu et al.

Heterogeneity in time series data is more pronounced than in vision or language, as temporal dynamics vary substantially across domains and tasks. Existing efforts on training time series foundation models (TSFMs) from scratch are often trained with mixed-batch strategies that merge large-scale datasets, which can cause gradient conflicts and degrade representation quality. To address this, we propose a fine-grained learning method that distills invariant knowledge from heterogeneous series while reducing cross-domain interference. We characterize heterogeneity at two levels: inter-domain and intra-domain. To tackle this bi-level heterogeneity, we design a federated learning method that mitigates intra-domain conflicts by enforcing domain-invariant and semantically consistent representations through local regularization, and addresses inter-domain discrepancies by enhancing cross-domain collaboration via domain-aware aggregation. Experiments across diverse benchmarks show that TSFMs trained with our method consistently outperform both centralized and federated TSFM baselines in point and probabilistic forecasting, while also achieving competitive zero-shot performance at scale, offering a flexible pathway for training TSFMs from scratch in heterogeneous environments.

CVMar 17, 2025Code
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models

Haoyang Li, Liang Wang, Chao Wang et al.

The Base-New Trade-off (BNT) problem universally exists during the optimization of CLIP-based prompt tuning, where continuous fine-tuning on base (target) classes leads to a simultaneous decrease of generalization ability on new (unseen) classes. Existing approaches attempt to regulate the prompt tuning process to balance BNT by appending constraints. However, imposed on the same target prompt, these constraints fail to fully avert the mutual exclusivity between the optimization directions for base and new. As a novel solution to this challenge, we propose the plug-and-play Dual-Prompt Collaboration (DPC) framework, the first that decoupling the optimization processes of base and new tasks at the prompt level. Specifically, we clone a learnable parallel prompt based on the backbone prompt, and introduce a variable Weighting-Decoupling framework to independently control the optimization directions of dual prompts specific to base or new tasks, thus avoiding the conflict in generalization. Meanwhile, we propose a Dynamic Hard Negative Optimizer, utilizing dual prompts to construct a more challenging optimization task on base classes for enhancement. For interpretability, we prove the feature channel invariance of the prompt vector during the optimization process, providing theoretical support for the Weighting-Decoupling of DPC. Extensive experiments on multiple backbones demonstrate that DPC can significantly improve base performance without introducing any external knowledge beyond the base classes, while maintaining generalization to new classes. Code is available at: https://github.com/JREion/DPC.

LGApr 16, 2024Code
What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning

Zhihong Deng, Jing Jiang, Guodong Long et al.

In sequential decision-making problems involving sensitive attributes like race and gender, reinforcement learning (RL) agents must carefully consider long-term fairness while maximizing returns. Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear. In this paper, we address this gap in the literature by investigating the sources of inequality through a causal lens. We first analyse the causal relationships governing the data generation process and decompose the effect of sensitive attributes on long-term well-being into distinct components. We then introduce a novel notion called dynamics fairness, which explicitly captures the inequality stemming from environmental dynamics, distinguishing it from those induced by decision-making or inherited from the past. This notion requires evaluating the expected changes in the next state and the reward induced by changing the value of the sensitive attribute while holding everything else constant. To quantitatively evaluate this counterfactual concept, we derive identification formulas that allow us to obtain reliable estimations from data. Extensive experiments demonstrate the effectiveness of the proposed techniques in explaining, detecting, and reducing inequality in reinforcement learning. We publicly release code at https://github.com/familyld/InsightFair.

CVFeb 25
Meta-FC: Meta-Learning with Feature Consistency for Robust and Generalizable Watermarking

Yuheng Li, Weitong Chen, Chengcheng Zhu et al.

Deep learning-based watermarking has made remarkable progress in recent years. To achieve robustness against various distortions, current methods commonly adopt a training strategy where a \underline{\textbf{s}}ingle \underline{\textbf{r}}andom \underline{\textbf{d}}istortion (SRD) is chosen as the noise layer in each training batch. However, the SRD strategy treats distortions independently within each batch, neglecting the inherent relationships among different types of distortions and causing optimization conflicts across batches. As a result, the robustness and generalizability of the watermarking model are limited. To address this issue, we propose a novel training strategy that enhances robustness and generalization via \underline{\textbf{meta}}-learning with \underline{\textbf{f}}eature \underline{\textbf{c}}onsistency (Meta-FC). Specifically, we randomly sample multiple distortions from the noise pool to construct a meta-training task, while holding out one distortion as a simulated ``unknown'' distortion for the meta-testing phase. Through meta-learning, the model is encouraged to identify and utilize neurons that exhibit stable activations across different types of distortions, mitigating the optimization conflicts caused by the random sampling of diverse distortions in each batch. To further promote the transformation of stable activations into distortion-invariant representations, we introduce a feature consistency loss that constrains the decoded features of the same image subjected to different distortions to remain consistent. Extensive experiments demonstrate that, compared to the SRD training strategy, Meta-FC improves the robustness and generalization of various watermarking models by an average of 1.59\%, 4.71\%, and 2.38\% under high-intensity, combined, and unknown distortions.

CVAug 4, 2025Code
Raw Data Matters: Enhancing Prompt Tuning by Internal Augmentation on Vision-Language Models

Haoyang Li, Liang Wang, Chao Wang et al.

For CLIP-based prompt tuning, introducing more data as additional knowledge for enhancing fine-tuning process is proved to be an effective approach. Existing data amplification strategies for prompt tuning typically rely on external knowledge (e.g., large language models or pre-structured knowledge bases), resulting in higher costs for data collection and processing, while generally ignoring further utilization of features in image modality. To address this, we propose Augmentation-driven Prompt Tuning (AugPT), a self-contained distillation-based prompt tuning approach using only internal augmentation on raw dataset to better exploit known features. Specifically, AugPT employs self-supervised augmentation on unlabeled images in the training set, and introduces a novel gating mechanism based on consensus test, reusing the pre-trained prompt tuning backbone model to spontaneously filter noisy samples, further enhancing the quality of augmented views. Extensive experiments validate that AugPT simultaneously enhances model performance and generalization capability without using appended external knowledge. The code of AugPT is available at: https://github.com/JREion/AugPT .

CVMar 23, 2025Code
MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models

Haoyang Li, Siyu Zhou, Liang Wang et al.

Though CLIP-based prompt tuning significantly enhances pre-trained Vision-Language Models, existing research focuses on reconstructing the model architecture, e.g., additional loss calculation and meta-networks. These approaches generally lead to increased complexity and extended training cost. To maintain the efficiency of the tuning process, we propose plug-and-play Model-Agnostic Optimization (MAO) for prompt tuning. Without altering any components of the prompt tuning backbone, we introduce a Data-Driven Enhancement framework to optimize the distribution of the initial data, and incorporate an Alterable Regularization module to boost the task-specific feature processing pipeline, thereby improving overall performance while maintaining low computational cost. Extensive experiments on MAO demonstrate its outstanding performance and efficiency. The code of MAO is available at: https://github.com/JREion/M.A.O .

CLMay 12, 2023Code
Synergistic Interplay between Search and Large Language Models for Information Retrieval

Jiazhan Feng, Chongyang Tao, Xiubo Geng et al.

Information retrieval (IR) plays a crucial role in locating relevant resources from vast amounts of data, and its applications have evolved from traditional knowledge bases to modern retrieval models (RMs). The emergence of large language models (LLMs) has further revolutionized the IR field by enabling users to interact with search systems in natural languages. In this paper, we explore the advantages and disadvantages of LLMs and RMs, highlighting their respective strengths in understanding user-issued queries and retrieving up-to-date information. To leverage the benefits of both paradigms while circumventing their limitations, we propose InteR, a novel framework that facilitates information refinement through synergy between RMs and LLMs. InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections and enables LLMs to enhance prompt formulation using retrieved documents. This iterative refinement process augments the inputs of RMs and LLMs, leading to more accurate retrieval. Experiments on large-scale retrieval benchmarks involving web search and low-resource retrieval tasks demonstrate that InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods, even those using relevance judgment. Source code is available at https://github.com/Cyril-JZ/InteR

LGSep 7, 2021Code
Sequential Diagnosis Prediction with Transformer and Ontological Representation

Xueping Peng, Guodong Long, Tao Shen et al.

Sequential diagnosis prediction on the Electronic Health Record (EHR) has been proven crucial for predictive analytics in the medical domain. EHR data, sequential records of a patient's interactions with healthcare systems, has numerous inherent characteristics of temporality, irregularity and data insufficiency. Some recent works train healthcare predictive models by making use of sequential information in EHR data, but they are vulnerable to irregular, temporal EHR data with the states of admission/discharge from hospital, and insufficient data. To mitigate this, we propose an end-to-end robust transformer-based model called SETOR, which exploits neural ordinary differential equation to handle both irregular intervals between a patient's visits with admitted timestamps and length of stay in each visit, to alleviate the limitation of insufficient data by integrating medical ontology, and to capture the dependencies between the patient's visits by employing multi-layer transformer blocks. Experiments conducted on two real-world healthcare datasets show that, our sequential diagnoses prediction model SETOR not only achieves better predictive results than previous state-of-the-art approaches, irrespective of sufficient or insufficient training data, but also derives more interpretable embeddings of medical codes. The experimental codes are available at the GitHub repository (https://github.com/Xueping/SETOR).

LGAug 25, 2021Code
TraverseNet: Unifying Space and Time in Message Passing for Traffic Forecasting

Zonghan Wu, Da Zheng, Shirui Pan et al.

This paper aims to unify spatial dependency and temporal dependency in a non-Euclidean space while capturing the inner spatial-temporal dependencies for traffic data. For spatial-temporal attribute entities with topological structure, the space-time is consecutive and unified while each node's current status is influenced by its neighbors' past states over variant periods of each neighbor. Most spatial-temporal neural networks for traffic forecasting study spatial dependency and temporal correlation separately in processing, gravely impaired the spatial-temporal integrity, and ignore the fact that the neighbors' temporal dependency period for a node can be delayed and dynamic. To model this actual condition, we propose TraverseNet, a novel spatial-temporal graph neural network, viewing space and time as an inseparable whole, to mine spatial-temporal graphs while exploiting the evolving spatial-temporal dependencies for each node via message traverse mechanisms. Experiments with ablation and parameter studies have validated the effectiveness of the proposed TraverseNet, and the detailed implementation can be found from https://github.com/nnzhan/TraverseNet.

LGAug 19, 2021Code
Multi-Center Federated Learning: Clients Clustering for Better Personalization

Guodong Long, Ming Xie, Tao Shen et al.

Personalized decision-making can be implemented in a Federated learning (FL) framework that can collaboratively train a decision model by extracting knowledge across intelligent clients, e.g. smartphones or enterprises. FL can mitigate the data privacy risk of collaborative training since it merely collects local gradients from users without access to their data. However, FL is fragile in the presence of statistical heterogeneity that is commonly encountered in personalized decision-making, e.g., non-IID data over different clients. Existing FL approaches usually update a single global model to capture the shared knowledge of all users by aggregating their gradients, regardless of the discrepancy between their data distributions. By comparison, a mixture of multiple global models could capture the heterogeneity across various clients if assigning the client to different global models (i.e., centers) in FL. To this end, we propose a novel multi-center aggregation mechanism to cluster clients using their models' parameters. It learns multiple global models from data as the cluster centers, and simultaneously derives the optimal matching between users and centers. We then formulate it as an optimization problem that can be efficiently solved by a stochastic expectation maximization (EM) algorithm. Experiments on multiple benchmark datasets of FL show that our method outperforms several popular baseline methods. The experimental source codes are publicly available on the Github repository https://github.com/mingxuts/multi-center-fed-learning .

LGJun 21, 2020Code
A Universal Representation Transformer Layer for Few-Shot Image Classification

Lu Liu, William Hamilton, Guodong Long et al.

Few-shot classification aims to recognize unseen classes when presented with only a small number of samples. We consider the problem of multi-domain few-shot image classification, where unseen classes and examples come from diverse data sources. This problem has seen growing interest and has inspired the development of benchmarks such as Meta-Dataset. A key challenge in this multi-domain setting is to effectively integrate the feature representations from the diverse set of training domains. Here, we propose a Universal Representation Transformer (URT) layer, that meta-learns to leverage universal features for few-shot classification by dynamically re-weighting and composing the most appropriate domain-specific representations. In experiments, we show that URT sets a new state-of-the-art result on Meta-Dataset. Specifically, it achieves top-performance on the highest number of data sources compared to competing methods. We analyze variants of URT and present a visualization of the attention score heatmaps that sheds light on how the model performs cross-domain generalization. Our code is available at https://github.com/liulu112601/URT.

LGJun 15, 2020Code
Self-Attention Enhanced Patient Journey Understanding in Healthcare System

Xueping Peng, Guodong Long, Tao Shen et al.

Understanding patients' journeys in healthcare system is a fundamental prepositive task for a broad range of AI-based healthcare applications. This task aims to learn an informative representation that can comprehensively encode hidden dependencies among medical events and its inner entities, and then the use of encoding outputs can greatly benefit the downstream application-driven tasks. A patient journey is a sequence of electronic health records (EHRs) over time that is organized at multiple levels: patient, visits and medical codes. The key challenge of patient journey understanding is to design an effective encoding mechanism which can properly tackle the aforementioned multi-level structured patient journey data with temporal sequential visits and a set of medical codes. This paper proposes a novel self-attention mechanism that can simultaneously capture the contextual and temporal relationships hidden in patient journeys. A multi-level self-attention network (MusaNet) is specifically designed to learn the representations of patient journeys that is used to be a long sequence of activities. The MusaNet is trained in end-to-end manner using the training data derived from EHRs. We evaluated the efficacy of our method on two medical application tasks with real-world benchmark datasets. The results have demonstrated the proposed MusaNet produces higher-quality representations than state-of-the-art baseline methods. The source code is available in https://github.com/xueping/MusaNet.

LGFeb 24, 2020Code
Omni-Scale CNNs: a simple and effective kernel size configuration for time series classification

Wensi Tang, Guodong Long, Lu Liu et al.

The Receptive Field (RF) size has been one of the most important factors for One Dimensional Convolutional Neural Networks (1D-CNNs) on time series classification tasks. Large efforts have been taken to choose the appropriate size because it has a huge influence on the performance and differs significantly for each dataset. In this paper, we propose an Omni-Scale block (OS-block) for 1D-CNNs, where the kernel sizes are decided by a simple and universal rule. Particularly, it is a set of kernel sizes that can efficiently cover the best RF size across different datasets via consisting of multiple prime numbers according to the length of the time series. The experiment result shows that models with the OS-block can achieve a similar performance as models with the searched optimal RF size and due to the strong optimal RF size capture ability, simple 1D-CNN models with OS-block achieves the state-of-the-art performance on four time series benchmarks, including both univariate and multivariate data from multiple domains. Comprehensive analysis and discussions shed light on why the OS-block can capture optimal RF sizes across different datasets. Code available [https://github.com/Wensi-Tang/OS-CNN]

LGSep 11, 2019Code
Learning to Propagate for Graph Meta-Learning

Lu Liu, Tianyi Zhou, Guodong Long et al.

Meta-learning extracts common knowledge from learning different tasks and uses it for unseen tasks. It can significantly improve tasks that suffer from insufficient training data, e.g., few shot learning. In most meta-learning methods, tasks are implicitly related by sharing parameters or optimizer. In this paper, we show that a meta-learner that explicitly relates tasks on a graph describing the relations of their output dimensions (e.g., classes) can significantly improve few shot learning. The graph's structure is usually free or cheap to obtain but has rarely been explored in previous works. We develop a novel meta-learner of this type for prototype-based classification, in which a prototype is generated for each class, such that the nearest neighbor search among the prototypes produces an accurate classification. The meta-learner, called "Gated Propagation Network (GPN)", learns to propagate messages between prototypes of different classes on the graph, so that learning the prototype of each class benefits from the data of other related classes. In GPN, an attention mechanism aggregates messages from neighboring classes of each class, with a gate choosing between the aggregated message and the message from the class itself. We train GPN on a sequence of tasks from many-shot to few shot generated by subgraph sampling. During training, it is able to reuse and update previously achieved prototypes from the memory in a life-long learning cycle. In experiments, under different training-test discrepancy and test task generation settings, GPN outperforms recent meta-learning methods on two benchmark datasets. The code of GPN and dataset generation is available at https://github.com/liulu112601/Gated-Propagation-Net.

LGJan 3, 2019Code
A Comprehensive Survey on Graph Neural Networks

Zonghan Wu, Shirui Pan, Fengwen Chen et al.

Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.

LGMar 28, 2024
Dual-Personalizing Adapter for Federated Foundation Models

Yiyuan Yang, Guodong Long, Tao Shen et al.

Recently, foundation models, particularly large language models (LLMs), have demonstrated an impressive ability to adapt to various tasks by fine-tuning diverse instruction data. Notably, federated foundation models (FedFM) emerge as a privacy preservation method to fine-tune models collaboratively under federated learning (FL) settings by leveraging many distributed datasets with non-IID data. To alleviate communication and computation overhead, parameter-efficient methods are introduced for efficiency, and some research adapted personalization methods to FedFM for better user preferences alignment. However, a critical gap in existing research is the neglect of test-time distribution shifts in real-world applications, and conventional methods for test-time distribution shifts in personalized FL are less effective for FedFM due to their failure to adapt to complex distribution shift scenarios and the requirement to train all parameters. To bridge this gap, we refine the setting in FedFM, termed test-time personalization, which aims to learn personalized federated foundation models on clients while effectively handling test-time distribution shifts simultaneously. To address challenges in this setting, we explore a simple yet effective solution, a Federated Dual-Personalizing Adapter (FedDPA) architecture. By co-working with a foundation model, a global adapter and a local adapter jointly tackle the test-time distribution shifts and client-specific personalization. Additionally, we introduce an instance-wise dynamic weighting mechanism that dynamically integrates the global and local adapters for each test instance during inference, facilitating effective test-time personalization. The effectiveness of the proposed method has been evaluated on benchmark datasets across different NLP tasks.

LGDec 5, 2023
Foundation Models for Weather and Climate Data Understanding: A Comprehensive Survey

Shengchao Chen, Guodong Long, Jing Jiang et al.

As artificial intelligence (AI) continues to rapidly evolve, the realm of Earth and atmospheric sciences is increasingly adopting data-driven models, powered by progressive developments in deep learning (DL). Specifically, DL techniques are extensively utilized to decode the chaotic and nonlinear aspects of Earth systems, and to address climate challenges via understanding weather and climate data. Cutting-edge performance on specific tasks within narrower spatio-temporal scales has been achieved recently through DL. The rise of large models, specifically large language models (LLMs), has enabled fine-tuning processes that yield remarkable outcomes across various downstream tasks, thereby propelling the advancement of general AI. However, we are still navigating the initial stages of crafting general AI for weather and climate. In this survey, we offer an exhaustive, timely overview of state-of-the-art AI methodologies specifically engineered for weather and climate data, with a special focus on time series and text data. Our primary coverage encompasses four critical aspects: types of weather and climate data, principal model architectures, model scopes and applications, and datasets for weather and climate. Furthermore, in relation to the creation and application of foundation models for weather and climate data understanding, we delve into the field's prevailing challenges, offer crucial insights, and propose detailed avenues for future research. This comprehensive approach equips practitioners with the requisite knowledge to make substantial progress in this domain. Our survey encapsulates the most recent breakthroughs in research on large, data-driven models for weather and climate data understanding, emphasizing robust foundations, current advancements, practical applications, crucial resources, and prospective research opportunities.

59.2LGMay 1
Federated Weather Modeling on Sensor Data

Shengchao Chen, Guodong Long

Federated weather modeling on sensor data is a distributed system underpinned by federated learning, enabling multiple sensor data sources, including ground weather stations, satellites and IoT devices, to collaboratively train deep learning models without sharing raw data. This method safeguards data privacy and security while leverages diverse, geographically distributed datasets to improve the accuracy and robustness of global/regional weather modeling tasks such as forecasting and anomaly detection.

AO-PHMay 24, 2024
Personalized Adapter for Large Meteorology Model on Devices: Towards Weather Foundation Models

Shengchao Chen, Guodong Long, Jing Jiang et al.

This paper demonstrates that pre-trained language models (PLMs) are strong foundation models for on-device meteorological variables modeling. We present LM-Weather, a generic approach to taming PLMs, that have learned massive sequential knowledge from the universe of natural language databases, to acquire an immediate capability to obtain highly customized models for heterogeneous meteorological data on devices while keeping high efficiency. Concretely, we introduce a lightweight personalized adapter into PLMs and endows it with weather pattern awareness. During communication between clients and the server, low-rank-based transmission is performed to effectively fuse the global knowledge among devices while maintaining high communication efficiency and ensuring privacy. Experiments on real-wold dataset show that LM-Weather outperforms the state-of-the-art results by a large margin across various tasks (e.g., forecasting and imputation at different scales). We provide extensive and in-depth analyses experiments, which verify that LM-Weather can (1) indeed leverage sequential knowledge from natural language to accurately handle meteorological sequence, (2) allows each devices obtain highly customized models under significant heterogeneity, and (3) generalize under data-limited and out-of-distribution (OOD) scenarios.

LGDec 12, 2024
Federated Foundation Models on Heterogeneous Time Series

Shengchao Chen, Guodong Long, Jing Jiang et al.

Training a general-purpose time series foundation models with robust generalization capabilities across diverse applications from scratch is still an open challenge. Efforts are primarily focused on fusing cross-domain time series datasets to extract shared subsequences as tokens for training models on Transformer architecture. However, due to significant statistical heterogeneity across domains, this cross-domain fusing approach doesn't work effectively as the same as fusing texts and images. To tackle this challenge, this paper proposes a novel federated learning approach to address the heterogeneity in time series foundation models training, namely FFTS. Specifically, each data-holding organization is treated as an independent client in a collaborative learning framework with federated settings, and then many client-specific local models will be trained to preserve the unique characteristics per dataset. Moreover, a new regularization mechanism will be applied to both client-side and server-side, thus to align the shared knowledge across heterogeneous datasets from different domains. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed federated learning approach. The newly learned time series foundation models achieve superior generalization capabilities on cross-domain time series analysis tasks, including forecasting, imputation, and anomaly detection.

CROct 18, 2024
DMGNN: Detecting and Mitigating Backdoor Attacks in Graph Neural Networks

Hao Sui, Bing Chen, Jiale Zhang et al.

Recent studies have revealed that GNNs are highly susceptible to multiple adversarial attacks. Among these, graph backdoor attacks pose one of the most prominent threats, where attackers cause models to misclassify by learning the backdoored features with injected triggers and modified target labels during the training phase. Based on the features of the triggers, these attacks can be categorized into out-of-distribution (OOD) and in-distribution (ID) graph backdoor attacks, triggers with notable differences from the clean sample feature distributions constitute OOD backdoor attacks, whereas the triggers in ID backdoor attacks are nearly identical to the clean sample feature distributions. Existing methods can successfully defend against OOD backdoor attacks by comparing the feature distribution of triggers and clean samples but fail to mitigate stealthy ID backdoor attacks. Due to the lack of proper supervision signals, the main task accuracy is negatively affected in defending against ID backdoor attacks. To bridge this gap, we propose DMGNN against OOD and ID graph backdoor attacks that can powerfully eliminate stealthiness to guarantee defense effectiveness and improve the model performance. Specifically, DMGNN can easily identify the hidden ID and OOD triggers via predicting label transitions based on counterfactual explanation. To further filter the diversity of generated explainable graphs and erase the influence of the trigger features, we present a reverse sampling pruning method to screen and discard the triggers directly on the data level. Extensive experimental evaluations on open graph datasets demonstrate that DMGNN far outperforms the state-of-the-art (SOTA) defense methods, reducing the attack success rate to 5% with almost negligible degradation in model performance (within 3.5%).