IRJan 16, 2023Code
Dual Personalization on Federated RecommendationChunxu Zhang, Guodong Long, Tianyi Zhou et al.
Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available.
IRAug 16, 2024Code
Personalized Federated Collaborative Filtering: A Variational AutoEncoder ApproachZhiwei Li, Guodong Long, Tianyi Zhou et al.
Federated Collaborative Filtering (FedCF) is an emerging field focused on developing a new recommendation framework with preserving privacy in a federated setting. Existing FedCF methods typically combine distributed Collaborative Filtering (CF) algorithms with privacy-preserving mechanisms, and then preserve personalized information into a user embedding vector. However, the user embedding is usually insufficient to preserve the rich information of the fine-grained personalization across heterogeneous clients. This paper proposes a novel personalized FedCF method by preserving users' personalized information into a latent variable and a neural model simultaneously. Specifically, we decompose the modeling of user knowledge into two encoders, each designed to capture shared knowledge and personalized knowledge separately. A personalized gating network is then applied to balance personalization and generalization between the global and local encoders. Moreover, to effectively train the proposed framework, we model the CF problem as a specialized Variational AutoEncoder (VAE) task by integrating user interaction vector reconstruction with missing value prediction. The decoder is trained to reconstruct the implicit feedback from items the user has interacted with, while also predicting items the user might be interested in but has not yet interacted with. Experimental results on benchmark datasets demonstrate that the proposed method outperforms other baseline methods, showcasing superior performance. Our code is available at https://github.com/mtics/FedDAE.
LGJul 4, 2023
Causal Reinforcement Learning: A SurveyZhihong Deng, Jing Jiang, Guodong Long et al. · pku
Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. Despite many remarkable achievements in recent decades, applying reinforcement learning methods in the real world remains challenging. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world and must therefore learn from scratch through numerous trial-and-error interactions. They may also face challenges in providing explanations for their decisions and generalizing the acquired knowledge. Causality, however, offers a notable advantage as it can formalize knowledge in a systematic manner and leverage invariance for effective knowledge transfer. This has led to the emergence of causal reinforcement learning, a subfield of reinforcement learning that seeks to enhance existing algorithms by incorporating causal relationships into the learning process. In this survey, we comprehensively review the literature on causal reinforcement learning. We first introduce the basic concepts of causality and reinforcement learning, and then explain how causality can address core challenges in non-causal reinforcement learning. We categorize and systematically review existing causal reinforcement learning approaches based on their target problems and methodologies. Finally, we outline open issues and future directions in this emerging field.
LGApr 9, 2023
Does Continual Learning Equally Forget All Parameters?Haiyan Zhao, Tianyi Zhou, Guodong Long et al. · uw
Distribution shift (e.g., task or domain shift) in continual learning (CL) usually results in catastrophic forgetting of neural networks. Although it can be alleviated by repeatedly replaying buffered data, the every-step replay is time-consuming. In this paper, we study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL. Our proposed metrics show that only a few modules are more task-specific and sensitively alter between tasks, while others can be shared across tasks as common knowledge. Hence, we attribute forgetting mainly to the former and find that finetuning them only on a small buffer at the end of any CL method can bring non-trivial improvement. Due to the small number of finetuned parameters, such ``Forgetting Prioritized Finetuning (FPF)'' is efficient in computation. We further propose a more efficient and simpler method that entirely removes the every-step replay and replaces them by only $k$-times of FPF periodically triggered during CL. Surprisingly, this ``$k$-FPF'' performs comparably to FPF and outperforms the SOTA CL methods but significantly reduces their computational overhead and cost. In experiments on several benchmarks of class- and domain-incremental CL, FPF consistently improves existing CL methods by a large margin, and $k$-FPF further excels in efficiency without degrading the accuracy. We also empirically studied the impact of buffer size, epochs per task, and finetuning modules on the cost and accuracy of our methods.
LGNov 23, 2022
Federated Learning on Non-IID Graphs via Structural Knowledge SharingYue Tan, Yixin Liu, Guodong Long et al.
Graph neural networks (GNNs) have shown their superiority in modeling graph data. Owing to the advantages of federated learning, federated graph learning (FGL) enables clients to train strong GNN models in a distributed manner without sharing their private data. A core challenge in federated systems is the non-IID problem, which also widely exists in real-world graph data. For example, local data of clients may come from diverse datasets or even domains, e.g., social networks and molecules, increasing the difficulty for FGL methods to capture commonly shared knowledge and learn a generalized encoder. From real-world graph datasets, we observe that some structural properties are shared by various domains, presenting great potential for sharing structural knowledge in FGL. Inspired by this, we propose FedStar, an FGL framework that extracts and shares the common underlying structure information for inter-graph federated learning tasks. To explicitly extract the structure information rather than encoding them along with the node features, we define structure embeddings and encode them with an independent structure encoder. Then, the structure encoder is shared across clients while the feature-based knowledge is learned in a personalized way, making FedStar capable of capturing more structure-based domain-invariant information and avoiding feature misalignment issues. We perform extensive experiments over both cross-dataset and cross-domain non-IID FGL settings, demonstrating the superiority of FedStar.
CLMar 20, 2022
Perceiving the World: Question-guided Reinforcement Learning for Text-based GamesYunqiu Xu, Meng Fang, Ling Chen et al.
Text-based games provide an interactive way to study natural language processing. While deep reinforcement learning has shown effectiveness in developing the game playing agent, the low sample efficiency and the large action space remain to be the two major challenges that hinder the DRL from being applied in the real world. In this paper, we address the challenges by introducing world-perceiving modules, which automatically decompose tasks and prune actions by answering questions about the environment. We then propose a two-phase training framework to decouple language learning from reinforcement learning, which further improves the sample efficiency. The experimental results show that the proposed method significantly improves the performance and sample efficiency. Besides, it shows robustness against compound error and limited pre-training data.
CLJun 1, 2023
Improving the Robustness of Summarization Systems with Dual AugmentationXiuying Chen, Guodong Long, Chongyang Tao et al.
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input. In this work, we first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise. To create semantic-consistent substitutes, we propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models. Experimental results show that state-of-the-art summarization models have a significant decrease in performance on adversarial and noisy test sets. Next, we analyze the vulnerability of the summarization systems and explore improving the robustness by data augmentation. Specifically, the first brittleness factor we found is the poor understanding of infrequent words in the input. Correspondingly, we feed the encoder with more diverse cases created by SummAttacker in the input space. The other factor is in the latent space, where the attacked inputs bring more variations to the hidden states. Hence, we construct adversarial decoder input and devise manifold softmixing operation in hidden space to introduce more diversity. Experimental results on Gigaword and CNN/DM datasets demonstrate that our approach achieves significant improvements over strong baselines and exhibits higher robustness on noisy, attacked, and clean datasets.
LGJan 27, 2023
Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream TasksHaiyan Zhao, Tianyi Zhou, Guodong Long et al.
As a few large-scale pre-trained models become the major choices of various applications, new challenges arise for model pruning, e.g., can we avoid pruning the same model from scratch for every downstream task? How to reuse the pruning results of previous tasks to accelerate the pruning for a new task? To address these challenges, we create a small model for a new task from the pruned models of similar tasks. We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task. We study this ''meta-pruning'' from nearest tasks on two major classes of pre-trained models, convolutional neural network (CNN) and vision transformer (ViT), under a limited budget of pruning iterations. Our study begins by investigating the overlap of pruned models for similar tasks and how the overlap changes over different layers and blocks. Inspired by these discoveries, we develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task by initializing a sub-network from the pruned models of its nearest tasks. In experiments, we demonstrate MVP's advantages in accuracy, efficiency, and generalization through extensive empirical studies and comparisons with popular pruning methods over several datasets.
LGMay 29, 2022
Diminishing Empirical Risk Minimization for Unsupervised Anomaly DetectionShaoshen Wang, Yanbin Liu, Ling Chen et al.
Unsupervised anomaly detection (AD) is a challenging task in realistic applications. Recently, there is an increasing trend to detect anomalies with deep neural networks (DNN). However, most popular deep AD detectors cannot protect the network from learning contaminated information brought by anomalous data, resulting in unsatisfactory detection performance and overfitting issues. In this work, we identify one reason that hinders most existing DNN-based anomaly detection methods from performing is the wide adoption of the Empirical Risk Minimization (ERM). ERM assumes that the performance of an algorithm on an unknown distribution can be approximated by averaging losses on the known training set. This averaging scheme thus ignores the distinctions between normal and anomalous instances. To break through the limitations of ERM, we propose a novel Diminishing Empirical Risk Minimization (DERM) framework. Specifically, DERM adaptively adjusts the impact of individual losses through a well-devised aggregation strategy. Theoretically, our proposed DERM can directly modify the gradient contribution of each individual loss in the optimization process to suppress the influence of outliers, leading to a robust anomaly detector. Empirically, DERM outperformed the state-of-the-art on the unsupervised AD benchmark consisting of 18 datasets.
66.2LGMay 23
Beyond the Aggregation Dilemma: Prior-Retaining Decoupled Learning for Multimodal GraphsHao Yan, Xuanru Wang, Jun Yin et al.
Multimodal Attributed Graph Learning (MAGL) integrates intrinsic node attributes with structural topology via graph aggregation. However, as pretrained encoders evolve into Large Foundation Models (LFMs), the landscape of MAGL fundamentally shifts: under high-confidence LFM priors, mandatory aggregation introduces topological noise that overwhelms discriminative signals, triggering a counter-intuitive performance inversion where sophisticated MAGL architectures underperform simple topology-agnostic MLPs. Through systematic empirical and theoretical analysis, we identify that this inversion stems from a fundamental aggregation dilemma characterized by two concurrent pathologies: (1) Representational Pathology (SNR Degradation) - mandatory aggregation dilutes robust intrinsic features with topological noise, causing the noise penalty to outweigh its collaborative benefit; and (2) Optimization Pathology (Gradient Starvation) - topological aggregation attenuates gradient flow, while a shared task loss causes dominant modalities to prematurely suppress weaker ones. To resolve this dilemma, we propose SUPRA (Shared-Unique Prior-Retaining Architecture), a decoupled dual-pathway paradigm. SUPRA processes modality-specific features through topology-agnostic MLPs while capturing structural synergy via a lightweight shared GNN, with auxiliary deep supervision counteracting gradient starvation. Extensive evaluations demonstrate that SUPRA achieves state-of-the-art performance while requiring 3.5x lower peak GPU memory and up to 4.4x faster training time than Multimodal Graph Transformers.
MAJul 24, 2025Code
Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph GenerationShiyuan Li, Yixin Liu, Qingsong Wen et al.
Multi-agent systems (MAS) based on large language models (LLMs) have emerged as a powerful solution for dealing with complex problems across diverse domains. The effectiveness of MAS is critically dependent on its collaboration topology, which has become a focal point for automated design research. However, existing approaches are fundamentally constrained by their reliance on a template graph modification paradigm with a predefined set of agents and hard-coded interaction structures, significantly limiting their adaptability to task-specific requirements. To address these limitations, we reframe MAS design as a conditional autoregressive graph generation task, where both the system composition and structure are designed jointly. We propose ARG-Designer, a novel autoregressive model that operationalizes this paradigm by constructing the collaboration graph from scratch. Conditioned on a natural language task query, ARG-Designer sequentially and dynamically determines the required number of agents, selects their appropriate roles from an extensible pool, and establishes the optimal communication links between them. This generative approach creates a customized topology in a flexible and extensible manner, precisely tailored to the unique demands of different tasks. Extensive experiments across six diverse benchmarks demonstrate that ARG-Designer not only achieves state-of-the-art performance but also enjoys significantly greater token efficiency and enhanced extensibility. The source code of ARG-Designer is available at https://github.com/Shiy-Li/ARG-Designer.
LGMay 17, 2025Code
Exploring Criteria of Loss Reweighting to Enhance LLM UnlearningPuning Yang, Qizhou Wang, Zhuo Huang et al.
Loss reweighting has shown significant benefits for machine unlearning with large language models (LLMs). However, their exact functionalities are left unclear and the optimal strategy remains an open question, thus impeding the understanding and improvement of existing methodologies. In this paper, we identify two distinct goals of loss reweighting, namely, Saturation and Importance -- the former indicates that those insufficiently optimized data should be emphasized, while the latter stresses some critical data that are most influential for loss minimization. To study their usefulness, we design specific reweighting strategies for each goal and evaluate their respective effects on unlearning. We conduct extensive empirical analyses on well-established benchmarks, and summarize some important observations as follows: (i) Saturation enhances efficacy more than importance-based reweighting, and their combination can yield additional improvements. (ii) Saturation typically allocates lower weights to data with lower likelihoods, whereas importance-based reweighting does the opposite. (iii) The efficacy of unlearning is also largely influenced by the smoothness and granularity of the weight distributions. Based on these findings, we propose SatImp, a simple reweighting method that combines the advantages of both saturation and importance. Empirical results on extensive datasets validate the efficacy of our method, potentially bridging existing research gaps and indicating directions for future research. Our code is available at https://github.com/tmlr-group/SatImp.
68.6IRMay 16
Echoes in Filter Bubble: Diagnosing and Curing Popularity Bias in Generative RecommendersJun Yin, Bangguo Zhu, Peng Huo et al.
Recently, Generative Recommenders (GRs), characterized by a unified end-to-end framework, have exhibited astonishing potential in transforming the recommendation paradigm. Despite their effectiveness, we recognize that GRs are still susceptible to the long-standing issue of popularity bias that has pervaded the recommendation community. Although a few studies have attempted to extend traditional debiasing methods to GRs, their effectiveness is marginal, and the fundamental reason why GRs suffer from popularity bias remains under-explored. To bridge this gap, this study focuses on two core aspects in GRs: the optimization of generative framework and the item tokenization based on semantic index. Based on theoretical analyses, we identify that the severe popularity bias emerges from the confluence of a token-level optimization flaw and the undifferentiated property of item tokenization. Accordingly, this study develops a novel generative recommender system, called Ghost, by designing the asymmetric unlikelihood optimization and the skeleton-founded tokenization. Extensive empirical evaluations across three datasets, alongside multiple SOTA baselines, reveal that Ghost substantially alleviates popularity bias and promotes fairer recommendations, while incurring slight degradation to the overall recommendation utility.
AIMar 2
What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper ReproductionLehui Li, Ruining Wang, Haochen Song et al.
Automated paper reproduction -- generating executable code from academic papers -- is bottlenecked not by information retrieval but by the tacit knowledge that papers inevitably leave implicit. We formalize this challenge as the progressive recovery of three types of tacit knowledge -- relational, somatic, and collective -- and propose \method, a graph-based agent framework with a dedicated mechanism for each: node-level relation-aware aggregation recovers relational knowledge by analyzing implementation-unit-level reuse and adaptation relationships between the target paper and its citation neighbors; execution-feedback refinement recovers somatic knowledge through iterative debugging driven by runtime signals; and graph-level knowledge induction distills collective knowledge from clusters of papers sharing similar implementations. On an extended ReproduceBench spanning 3 domains, 10 tasks, and 40 recent papers, \method{} achieves an average performance gap of 10.04\% against official implementations, improving over the strongest baseline by 24.68\%. The code will be publicly released upon acceptance; the repository link will be provided in the final version.
LGApr 16, 2024Code
What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement LearningZhihong Deng, Jing Jiang, Guodong Long et al.
In sequential decision-making problems involving sensitive attributes like race and gender, reinforcement learning (RL) agents must carefully consider long-term fairness while maximizing returns. Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear. In this paper, we address this gap in the literature by investigating the sources of inequality through a causal lens. We first analyse the causal relationships governing the data generation process and decompose the effect of sensitive attributes on long-term well-being into distinct components. We then introduce a novel notion called dynamics fairness, which explicitly captures the inequality stemming from environmental dynamics, distinguishing it from those induced by decision-making or inherited from the past. This notion requires evaluating the expected changes in the next state and the reward induced by changing the value of the sensitive attribute while holding everything else constant. To quantitatively evaluate this counterfactual concept, we derive identification formulas that allow us to obtain reliable estimations from data. Extensive experiments demonstrate the effectiveness of the proposed techniques in explaining, detecting, and reducing inequality in reinforcement learning. We publicly release code at https://github.com/familyld/InsightFair.
LGAug 19, 2021Code
Multi-Center Federated Learning: Clients Clustering for Better PersonalizationGuodong Long, Ming Xie, Tao Shen et al.
Personalized decision-making can be implemented in a Federated learning (FL) framework that can collaboratively train a decision model by extracting knowledge across intelligent clients, e.g. smartphones or enterprises. FL can mitigate the data privacy risk of collaborative training since it merely collects local gradients from users without access to their data. However, FL is fragile in the presence of statistical heterogeneity that is commonly encountered in personalized decision-making, e.g., non-IID data over different clients. Existing FL approaches usually update a single global model to capture the shared knowledge of all users by aggregating their gradients, regardless of the discrepancy between their data distributions. By comparison, a mixture of multiple global models could capture the heterogeneity across various clients if assigning the client to different global models (i.e., centers) in FL. To this end, we propose a novel multi-center aggregation mechanism to cluster clients using their models' parameters. It learns multiple global models from data as the cluster centers, and simultaneously derives the optimal matching between users and centers. We then formulate it as an optimization problem that can be efficiently solved by a stochastic expectation maximization (EM) algorithm. Experiments on multiple benchmark datasets of FL show that our method outperforms several popular baseline methods. The experimental source codes are publicly available on the Github repository https://github.com/mingxuts/multi-center-fed-learning .
LGSep 11, 2019Code
Learning to Propagate for Graph Meta-LearningLu Liu, Tianyi Zhou, Guodong Long et al.
Meta-learning extracts common knowledge from learning different tasks and uses it for unseen tasks. It can significantly improve tasks that suffer from insufficient training data, e.g., few shot learning. In most meta-learning methods, tasks are implicitly related by sharing parameters or optimizer. In this paper, we show that a meta-learner that explicitly relates tasks on a graph describing the relations of their output dimensions (e.g., classes) can significantly improve few shot learning. The graph's structure is usually free or cheap to obtain but has rarely been explored in previous works. We develop a novel meta-learner of this type for prototype-based classification, in which a prototype is generated for each class, such that the nearest neighbor search among the prototypes produces an accurate classification. The meta-learner, called "Gated Propagation Network (GPN)", learns to propagate messages between prototypes of different classes on the graph, so that learning the prototype of each class benefits from the data of other related classes. In GPN, an attention mechanism aggregates messages from neighboring classes of each class, with a gate choosing between the aggregated message and the message from the class itself. We train GPN on a sequence of tasks from many-shot to few shot generated by subgraph sampling. During training, it is able to reuse and update previously achieved prototypes from the memory in a life-long learning cycle. In experiments, under different training-test discrepancy and test task generation settings, GPN outperforms recent meta-learning methods on two benchmark datasets. The code of GPN and dataset generation is available at https://github.com/liulu112601/Gated-Propagation-Net.
SIJan 14, 2019Code
Search Efficient Binary Network EmbeddingDaokun Zhang, Jie Yin, Xingquan Zhu et al.
Traditional network embedding primarily focuses on learning a continuous vector representation for each node, preserving network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily applied to the vector-format node representations for network analysis. However, the learned continuous vector representations are inefficient for large-scale similarity search, which often involves finding nearest neighbors measured by distance or similarity in a continuous vector space. In this paper, we propose a search efficient binary network embedding algorithm called BinaryNE to learn a binary code for each node, by simultaneously modeling node context relations and node attribute relations through a three-layer neural network. BinaryNE learns binary node representations through a stochastic gradient descent based online learning algorithm. The learned binary encoding not only reduces memory usage to represent each node, but also allows fast bit-wise comparisons to support faster node similarity search than using Euclidean distance or other distance measures. Extensive experiments and comparisons demonstrate that BinaryNE not only delivers more than 25 times faster search speed, but also provides comparable or better search quality than traditional continuous vector based network embedding methods. The binary codes learned by BinaryNE also render competitive performance on node classification and node clustering tasks. The source code of this paper is available at https://github.com/daokunzhang/BinaryNE.
SIJan 14, 2019Code
Attributed Network Embedding via Subspace DiscoveryDaokun Zhang, Jie Yin, Xingquan Zhu et al.
Network embedding aims to learn a latent, low-dimensional vector representations of network nodes, effective in supporting various network analytic tasks. While prior arts on network embedding focus primarily on preserving network topology structure to learn node representations, recently proposed attributed network embedding algorithms attempt to integrate rich node content information with network topological structure for enhancing the quality of network embedding. In reality, networks often have sparse content, incomplete node attributes, as well as the discrepancy between node attribute feature space and network structure space, which severely deteriorates the performance of existing methods. In this paper, we propose a unified framework for attributed network embedding-attri2vec-that learns node embeddings by discovering a latent node attribute subspace via a network structure guided transformation performed on the original attribute space. The resultant latent subspace can respect network structure in a more consistent way towards learning high-quality node representations. We formulate an optimization problem which is solved by an efficient stochastic gradient descent algorithm, with linear time complexity to the number of nodes. We investigate a series of linear and non-linear transformations performed on node attributes and empirically validate their effectiveness on various types of networks. Another advantage of attri2vec is its ability to solve out-of-sample problems, where embeddings of new coming nodes can be inferred from their node attributes through the learned mapping function. Experiments on various types of networks confirm that attri2vec is superior to state-of-the-art baselines for node classification, node clustering, as well as out-of-sample link prediction tasks. The source code of this paper is available at https://github.com/daokunzhang/attri2vec.
LGJan 3, 2019Code
A Comprehensive Survey on Graph Neural NetworksZonghan Wu, Shirui Pan, Fengwen Chen et al.
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
SIDec 4, 2017Code
Network Representation Learning: A SurveyDaokun Zhang, Jie Yin, Xingquan Zhu et al.
With the widespread use of information technologies, information networks are becoming increasingly popular to capture complex relationships across various disciplines, such as social networks, citation networks, telecommunication networks, and biological networks. Analyzing these networks sheds light on different aspects of social life such as the structure of societies, information diffusion, and communication patterns. In reality, however, the large scale of information networks often makes network analytic tasks computationally expensive or intractable. Network representation learning has been recently proposed as a new learning paradigm to embed network vertices into a low-dimensional vector space, by preserving network topology structure, vertex content, and other side information. This facilitates the original network to be easily handled in the new vector space for further analysis. In this survey, we perform a comprehensive review of the current literature on network representation learning in the data mining and machine learning field. We propose new taxonomies to categorize and summarize the state-of-the-art network representation learning techniques according to the underlying learning mechanisms, the network information intended to preserve, as well as the algorithmic designs and methodologies. We summarize evaluation protocols used for validating network representation learning including published benchmark datasets, evaluation methods, and open source algorithms. We also perform empirical studies to compare the performance of representative algorithms on common datasets, and analyze their computational complexity. Finally, we suggest promising research directions to facilitate future study.
LGDec 5, 2023
Foundation Models for Weather and Climate Data Understanding: A Comprehensive SurveyShengchao Chen, Guodong Long, Jing Jiang et al.
As artificial intelligence (AI) continues to rapidly evolve, the realm of Earth and atmospheric sciences is increasingly adopting data-driven models, powered by progressive developments in deep learning (DL). Specifically, DL techniques are extensively utilized to decode the chaotic and nonlinear aspects of Earth systems, and to address climate challenges via understanding weather and climate data. Cutting-edge performance on specific tasks within narrower spatio-temporal scales has been achieved recently through DL. The rise of large models, specifically large language models (LLMs), has enabled fine-tuning processes that yield remarkable outcomes across various downstream tasks, thereby propelling the advancement of general AI. However, we are still navigating the initial stages of crafting general AI for weather and climate. In this survey, we offer an exhaustive, timely overview of state-of-the-art AI methodologies specifically engineered for weather and climate data, with a special focus on time series and text data. Our primary coverage encompasses four critical aspects: types of weather and climate data, principal model architectures, model scopes and applications, and datasets for weather and climate. Furthermore, in relation to the creation and application of foundation models for weather and climate data understanding, we delve into the field's prevailing challenges, offer crucial insights, and propose detailed avenues for future research. This comprehensive approach equips practitioners with the requisite knowledge to make substantial progress in this domain. Our survey encapsulates the most recent breakthroughs in research on large, data-driven models for weather and climate data understanding, emphasizing robust foundations, current advancements, practical applications, crucial resources, and prospective research opportunities.
CVApr 13, 2024
Seeing Text in the Dark: Algorithm and BenchmarkChengpei Xu, Hao Fu, Long Ma et al.
Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for localizing text in dark that circumvents the need for LLE. We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector. This module is designed to guide the text detector in preserving textual spatial features amidst feature map resizing, thus minimizing the loss of spatial information in texts under low-light visual degradations. Specifically, we incorporate spatial reconstruction and spatial semantic constraints within this module to ensure the text detector acquires essential positional and contextual range knowledge. Our approach enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. In addition, we present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. Notably, our method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released.
AO-PHMay 24, 2024
Personalized Adapter for Large Meteorology Model on Devices: Towards Weather Foundation ModelsShengchao Chen, Guodong Long, Jing Jiang et al.
This paper demonstrates that pre-trained language models (PLMs) are strong foundation models for on-device meteorological variables modeling. We present LM-Weather, a generic approach to taming PLMs, that have learned massive sequential knowledge from the universe of natural language databases, to acquire an immediate capability to obtain highly customized models for heterogeneous meteorological data on devices while keeping high efficiency. Concretely, we introduce a lightweight personalized adapter into PLMs and endows it with weather pattern awareness. During communication between clients and the server, low-rank-based transmission is performed to effectively fuse the global knowledge among devices while maintaining high communication efficiency and ensuring privacy. Experiments on real-wold dataset show that LM-Weather outperforms the state-of-the-art results by a large margin across various tasks (e.g., forecasting and imputation at different scales). We provide extensive and in-depth analyses experiments, which verify that LM-Weather can (1) indeed leverage sequential knowledge from natural language to accurately handle meteorological sequence, (2) allows each devices obtain highly customized models under significant heterogeneity, and (3) generalize under data-limited and out-of-distribution (OOD) scenarios.
LGDec 12, 2024
Federated Foundation Models on Heterogeneous Time SeriesShengchao Chen, Guodong Long, Jing Jiang et al.
Training a general-purpose time series foundation models with robust generalization capabilities across diverse applications from scratch is still an open challenge. Efforts are primarily focused on fusing cross-domain time series datasets to extract shared subsequences as tokens for training models on Transformer architecture. However, due to significant statistical heterogeneity across domains, this cross-domain fusing approach doesn't work effectively as the same as fusing texts and images. To tackle this challenge, this paper proposes a novel federated learning approach to address the heterogeneity in time series foundation models training, namely FFTS. Specifically, each data-holding organization is treated as an independent client in a collaborative learning framework with federated settings, and then many client-specific local models will be trained to preserve the unique characteristics per dataset. Moreover, a new regularization mechanism will be applied to both client-side and server-side, thus to align the shared knowledge across heterogeneous datasets from different domains. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed federated learning approach. The newly learned time series foundation models achieve superior generalization capabilities on cross-domain time series analysis tasks, including forecasting, imputation, and anomaly detection.
LGFeb 13, 2025
Biologically Plausible Brain Graph TransformerCiyuan Peng, Yuelong Huang, Qichao Dong et al.
State-of-the-art brain graph analysis methods fail to fully encode the small-world architecture of brain graphs (accompanied by the presence of hubs and functional modules), and therefore lack biological plausibility to some extent. This limitation hinders their ability to accurately represent the brain's structural and functional properties, thereby restricting the effectiveness of machine learning models in tasks such as brain disorder detection. In this work, we propose a novel Biologically Plausible Brain Graph Transformer (BioBGT) that encodes the small-world architecture inherent in brain graphs. Specifically, we present a network entanglement-based node importance encoding technique that captures the structural importance of nodes in global information propagation during brain graph communication, highlighting the biological properties of the brain structure. Furthermore, we introduce a functional module-aware self-attention to preserve the functional segregation and integration characteristics of brain graphs in the learned representations. Experimental results on three benchmark datasets demonstrate that BioBGT outperforms state-of-the-art models, enhancing biologically plausible brain graph representations for various brain graph analytical tasks
CVOct 16, 2024
Mind the Gap Between Prototypes and Images in Cross-domain FinetuningHongduan Tian, Feng Liu, Zhanke Zhou et al.
In cross-domain few-shot classification (CFC), recent works mainly focus on adapting a simple transformation head on top of a frozen pre-trained backbone with few labeled data to project embeddings into a task-specific metric space where classification can be performed by measuring similarities between image instance and prototype representations. Technically, an assumption implicitly adopted in such a framework is that the prototype and image instance embeddings share the same representation transformation. However, in this paper, we find that there naturally exists a gap, which resembles the modality gap, between the prototype and image instance embeddings extracted from the frozen pre-trained backbone, and simply applying the same transformation during the adaptation phase constrains exploring the optimal representations and shrinks the gap between prototype and image representations. To solve this problem, we propose a simple yet effective method, contrastive prototype-image adaptation (CoPA), to adapt different transformations respectively for prototypes and images similarly to CLIP by treating prototypes as text prompts. Extensive experiments on Meta-Dataset demonstrate that CoPA achieves the state-of-the-art performance more efficiently. Meanwhile, further analyses also indicate that CoPA can learn better representation clusters, enlarge the gap, and achieve minimal validation loss at the enlarged gap.
IROct 11, 2024
Federated Vision-Language-Recommendation with Personalized FusionZhiwei Li, Guodong Long, Jing Jiang et al.
Applying large pre-trained Vision-Language Models to recommendation is a burgeoning field, a direction we term Vision-Language-Recommendation (VLR). Bringing VLR to user-oriented on-device intelligence within a federated learning framework is a crucial step for enhancing user privacy and delivering personalized experiences. This paper introduces FedVLR, a federated VLR framework specially designed for user-specific personalized fusion of vision-language representations. At its core is a novel bi-level fusion mechanism: The server-side multi-view fusion module first generates a diverse set of pre-fused multimodal views. Subsequently, each client employs a user-specific mixture-of-expert mechanism to adaptively integrate these views based on individual user interaction history. This designed lightweight personalized fusion module provides an efficient solution to implement a federated VLR system. The effectiveness of our proposed FedVLR has been validated on seven benchmark datasets.
AIApr 22, 2025
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM AgentsSiyu Zhou, Tianyi Zhou, Yijun Yang et al.
Can we build accurate world models out of large language models (LLMs)? How can world models benefit LLM agents? The gap between the prior knowledge of LLMs and the specified environment's dynamics usually bottlenecks LLMs' performance as world models. To bridge the gap, we propose a training-free "world alignment" that learns an environment's symbolic knowledge complementary to LLMs. The symbolic knowledge covers action rules, knowledge graphs, and scene graphs, which are extracted by LLMs from exploration trajectories and encoded into executable codes to regulate LLM agents' policies. We further propose an RL-free, model-based agent "WALL-E 2.0" through the model-predictive control (MPC) framework. Unlike classical MPC requiring costly optimization on the fly, we adopt an LLM agent as an efficient look-ahead optimizer of future steps' actions by interacting with the neurosymbolic world model. While the LLM agent's strong heuristics make it an efficient planner in MPC, the quality of its planned actions is also secured by the accurate predictions of the aligned world model. They together considerably improve learning efficiency in a new environment. On open-world challenges in Mars (Minecraft like) and ALFWorld (embodied indoor environments), WALL-E 2.0 significantly outperforms existing methods, e.g., surpassing baselines in Mars by 16.1%-51.6% of success rate and by at least 61.7% in score. In ALFWorld, it achieves a new record 98% success rate after only 4 iterations.
LGFeb 21
From Few-Shot to Zero-Shot: Towards Generalist Graph Anomaly DetectionYixin Liu, Shiyuan Li, Yu Zheng et al.
Graph anomaly detection (GAD) is critical for identifying abnormal nodes in graph-structured data from diverse domains, including cybersecurity and social networks. The existing GAD methods often focus on the learning paradigms of "one-model-for-one-dataset", requiring dataset-specific training for each dataset to achieve optimal performance. However, this paradigm suffers from significant limitations, such as high computational and data costs, limited generalization and transferability to new datasets, and challenges in privacy-sensitive scenarios where access to full datasets or sufficient labels is restricted. To address these limitations, we propose a novel generalist GAD paradigm that aims to develop a unified model capable of detecting anomalies on multiple unseen datasets without extensive retraining/fine-tuning or dataset-specific customization. To this end, we propose ARC, a few-shot generalist GAD method that leverages in-context learning and requires only a few labeled normal samples at inference time. Specifically, ARC consists of three core modules: a feature Alignment module to unify and align features across datasets, a Residual GNN encoder to capture dataset-agnostic anomaly representations, and a cross-attentive in-Context learning module to score anomalies using few-shot normal context. Building on ARC, we further introduce ARC_zero for the zero-shot generalist GAD setting, which selects representative pseudo-normal nodes via a pseudo-context mechanism and thus enables fully label-free inference on unseen datasets. Extensive experiments on 17 real-world graph datasets demonstrate that both ARC and ARC_zero effectively detect anomalies, exhibit strong generalization ability, and perform efficiently under few-shot and zero-shot settings.
LGMar 7
Rethinking Deep Research from the Perspective of Web Content Distribution MatchingZixuan Yu, Zhenheng Tang, Tongliang Liu et al.
Despite the integration of search tools, Deep Search Agents often suffer from a misalignment between reasoning-driven queries and the underlying web indexing structures. Existing frameworks treat the search engine as a static utility, leading to queries that are either too coarse or too granular to retrieve precise evidence. We propose WeDas, a Web Content Distribution Aware framework that incorporates search-space structural characteristics into the agent's observation space. Central to our method is the Query-Result Alignment Score, a metric quantifying the compatibility between agent intent and retrieval outcomes. To overcome the intractability of indexing the dynamic web, we introduce a few-shot probing mechanism that iteratively estimates this score via limited query accesses, allowing the agent to dynamically recalibrate sub-goals based on the local content landscape. As a plug-and-play module, WeDas consistently improves sub-goal completion and accuracy across four benchmarks, effectively bridging the gap between high-level reasoning and low-level retrieval.
LGJun 13, 2025
Learn to Preserve Personality: Federated Foundation Models in RecommendationsZhiwei Li, Guodong Long, Chunxu Zhang et al.
A core learning challenge for existed Foundation Models (FM) is striking the tradeoff between generalization with personalization, which is a dilemma that has been highlighted by various parameter-efficient adaptation techniques. Federated foundation models (FFM) provide a structural means to decouple shared knowledge from individual specific adaptations via decentralized processes. Recommendation systems offer a perfect testbed for FFMs, given their reliance on rich implicit feedback reflecting unique user characteristics. This position paper discusses a novel learning paradigm where FFMs not only harness their generalization capabilities but are specifically designed to preserve the integrity of user personality, illustrated thoroughly within the recommendation contexts. We envision future personal agents, powered by personalized adaptive FMs, guiding user decisions on content. Such an architecture promises a user centric, decentralized system where individuals maintain control over their personalized agents.
CYApr 26, 2024
Algorithmic Fairness: A Tolerance PerspectiveRenqiang Luo, Tao Tang, Feng Xia et al.
Recent advancements in machine learning and deep learning have brought algorithmic fairness into sharp focus, illuminating concerns over discriminatory decision making that negatively impacts certain individuals or groups. These concerns have manifested in legal, ethical, and societal challenges, including the erosion of trust in intelligent systems. In response, this survey delves into the existing literature on algorithmic fairness, specifically highlighting its multifaceted social consequences. We introduce a novel taxonomy based on 'tolerance', a term we define as the degree to which variations in fairness outcomes are acceptable, providing a structured approach to understanding the subtleties of fairness within algorithmic decisions. Our systematic review covers diverse industries, revealing critical insights into the balance between algorithmic decision making and social equity. By synthesizing these insights, we outline a series of emerging challenges and propose strategic directions for future research and policy making, with the goal of advancing the field towards more equitable algorithmic systems.
AIMar 9
Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational DatabasesJun Yin, Peng Huo, Bangguo Zhu et al.
In recent advances, to enable a fully data-driven learning paradigm on relational databases (RDB), relational deep learning (RDL) is proposed to structure the RDB as a heterogeneous entity graph and adopt the graph neural network (GNN) as the predictive model. However, existing RDL methods neglect the imbalance problem of relational data in RDBs and risk under-representing the minority entities, leading to an unusable model in practice. In this work, we investigate, for the first time, class imbalance problem in RDB entity classification and design the relation-centric minority synthetic over-sampling GNN (Rel-MOSS), in order to fill a critical void in the current literature. Specifically, to mitigate the issue of minority-related information being submerged by majority counterparts, we design the relation-wise gating controller to modulate neighborhood messages from each individual relation type. Based on the relational-gated representations, we further propose the relation-guided minority synthesizer for over-sampling, which integrates the entity relational signatures to maintain relational consistency. Extensive experiments on 12 entity classification datasets provide compelling evidence for the superiority of Rel-MOSS, yielding an average improvement of up to 2.46% and 4.00% in terms of Balanced Accuracy and G-Mean, compared with SOTA RDL methods and classic methods for handling class imbalance.
CVOct 12, 2025
Taming a Retrieval Framework to Read Images in Humanlike Manner for Augmenting Generation of MLLMsSuyang Xi, Chenxi Yang, Hong Ding et al.
Multimodal large language models (MLLMs) often fail in fine-grained visual question answering, producing hallucinations about object identities, positions, and relations because textual queries are not explicitly anchored to visual referents. Retrieval-augmented generation (RAG) alleviates some errors, but it fails to align with human-like processing at both the retrieval and augmentation levels. Specifically, it focuses only on global-level image information but lacks local detail and limits reasoning about fine-grained interactions. To overcome this limitation, we present Human-Like Retrieval-Augmented Generation (HuLiRAG), a framework that stages multimodal reasoning as a ``what--where--reweight'' cascade. Queries are first anchored to candidate referents via open-vocabulary detection (what), then spatially resolved with SAM-derived masks to recover fine-grained precision (where), and adaptively prioritized through the trade-off between local and global alignment (reweight). Mask-guided fine-tuning further injects spatial evidence into the generation process, transforming grounding from a passive bias into an explicit constraint on answer formulation. Extensive experiments demonstrate that this human-like cascade improves grounding fidelity and factual consistency while reducing hallucinations, advancing multimodal question answering toward trustworthy reasoning.
LGSep 28, 2025
Estimating Time Series Foundation Model Transferability via In-Context LearningQingren Yao, Ming Jin, Chengqi Zhang et al.
Time series foundation models (TSFMs) offer strong zero-shot forecasting via large-scale pre-training, yet fine-tuning remains critical for boosting performance in domains with limited public data. With the growing number of TSFMs, efficiently identifying the best model for downstream fine-tuning becomes increasingly challenging. In this work, we introduce TimeTic, a transferability estimation framework that recasts model selection as an in-context-learning problem: given observations on known (source) datasets, it predicts how a TSFM will perform after fine-tuning on a downstream (target) dataset. TimeTic flexibly organizes the observed model-data relationships as contextual information, allowing it to adapt seamlessly to various test-time scenarios. Leveraging the natural tabular structure formed by dataset meta-features, model characteristics, and fine-tuned performance, we employ tabular foundation models to serve as in-context learners. We further introduce a novel model characterization based on entropy evolution across model layers, capturing embedding-space distinctions and enabling TimeTic to generalize across arbitrary model sets. We establish a comprehensive benchmark for transferability estimation including 10 datasets, 10 foundation models, and 3 forecasting tasks. On this benchmark, TimeTic's estimation demonstrates strong alignment with actual fine-tuned performance for previously unseen datasets, achieving a mean rank correlation of approximately 0.6 and a 30% improvement compared to using zero-shot performance as the transferability score.
NEJun 20, 2025
Robust Dynamic Material Handling via Adaptive Constrained Evolutionary Reinforcement LearningChengpeng Hu, Ziming Wang, Bo Yuan et al.
Dynamic material handling (DMH) involves the assignment of dynamically arriving material transporting tasks to suitable vehicles in real time for minimising makespan and tardiness. In real-world scenarios, historical task records are usually available, which enables the training of a decision policy on multiple instances consisting of historical records. Recently, reinforcement learning has been applied to solve DMH. Due to the occurrence of dynamic events such as new tasks, adaptability is highly required. Solving DMH is challenging since constraints including task delay should be satisfied. A feedback is received only when all tasks are served, which leads to sparse reward. Besides, making the best use of limited computational resources and historical records for training a robust policy is crucial. The time allocated to different problem instances would highly impact the learning process. To tackle those challenges, this paper proposes a novel adaptive constrained evolutionary reinforcement learning (ACERL) approach, which maintains a population of actors for diverse exploration. ACERL accesses each actor for tackling sparse rewards and constraint violation to restrict the behaviour of the policy. Moreover, ACERL adaptively selects the most beneficial training instances for improving the policy. Extensive experiments on eight training and eight unseen test instances demonstrate the outstanding performance of ACERL compared with several state-of-the-art algorithms. Policies trained by ACERL can schedule the vehicles while fully satisfying the constraints. Additional experiments on 40 unseen noised instances show the robust performance of ACERL. Cross-validation further presents the overall effectiveness of ACREL. Besides, a rigorous ablation study highlights the coordination and benefits of each ingredient of ACERL.
LGApr 9, 2025
FedMerge: Federated Personalization via Model MergingShutong Chen, Tianyi Zhou, Guodong Long et al.
One global model in federated learning (FL) might not be sufficient to serve many clients with non-IID tasks and distributions. While there has been advances in FL to train multiple global models for better personalization, they only provide limited choices to clients so local finetuning is still indispensable. In this paper, we propose a novel ``FedMerge'' approach that can create a personalized model per client by simply merging multiple global models with automatically optimized and customized weights. In FedMerge, a few global models can serve many non-IID clients, even without further local finetuning. We formulate this problem as a joint optimization of global models and the merging weights for each client. Unlike existing FL approaches where the server broadcasts one or multiple global models to all clients, the server only needs to send a customized, merged model to each client. Moreover, instead of periodically interrupting the local training and re-initializing it to a global model, the merged model aligns better with each client's task and data distribution, smoothening the local-global gap between consecutive rounds caused by client drift. We evaluate FedMerge on three different non-IID settings applied to different domains with diverse tasks and data types, in which FedMerge consistently outperforms existing FL approaches, including clustering-based and mixture-of-experts (MoE) based methods.
IRMay 12, 2024
Navigating the Future of Federated Recommendation Systems with Foundation ModelsZhiwei Li, Guodong Long, Chunxu Zhang et al.
Federated Recommendation Systems (FRSs) offer a privacy-preserving alternative to traditional centralized approaches by decentralizing data storage. However, they face persistent challenges such as data sparsity and heterogeneity, largely due to isolated client environments. Recent advances in Foundation Models (FMs), particularly large language models like ChatGPT, present an opportunity to surmount these issues through powerful, cross-task knowledge transfer. In this position paper, we systematically examine the convergence of FRSs and FMs, illustrating how FM-enhanced frameworks can substantially improve client-side personalization, communication efficiency, and server-side aggregation. We also delve into pivotal challenges introduced by this integration, including privacy-security trade-offs, non-IID data, and resource constraints in federated setups, and propose prospective research directions in areas such as multimodal recommendation, real-time FM adaptation, and explainable federated reasoning. By unifying FRSs with FMs, our position paper provides a forward-looking roadmap for advancing privacy-preserving, high-performance recommendation systems that fully leverage large-scale pre-trained knowledge to enhance local performance.
LGFeb 6, 2024
Transductive Reward Inference on GraphBohao Qu, Xiaofeng Cao, Qing Guo et al.
In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning. Reward inference is the key to learning effective policies in practical scenarios, while direct environmental interactions are either too costly or unethical and the reward functions are rarely accessible, such as in healthcare and robotics. Our research focuses on developing a reward inference method based on the contextual properties of information propagation on graphs that capitalizes on a constrained number of human reward annotations to infer rewards for unlabelled data. We leverage both the available data and limited reward annotations to construct a reward propagation graph, wherein the edge weights incorporate various influential factors pertaining to the rewards. Subsequently, we employ the constructed graph for transductive reward inference, thereby estimating rewards for unlabelled data. Furthermore, we establish the existence of a fixed point during several iterations of the transductive inference process and demonstrate its at least convergence to a local optimum. Empirical evaluations on locomotion and robotic manipulation tasks validate the effectiveness of our approach. The application of our inferred rewards improves the performance in offline reinforcement learning tasks.
LGMay 23, 2023
Federated Prompt Learning for Weather Foundation Models on DevicesShengchao Chen, Guodong Long, Tao Shen et al.
On-device intelligence for weather forecasting uses local deep learning models to analyze weather patterns without centralized cloud computing, holds significance for supporting human activates. Federated Learning is a promising solution for such forecasting by enabling collaborative model training without sharing raw data. However, it faces three main challenges that hinder its reliability: (1) data heterogeneity among devices due to geographic differences; (2) data homogeneity within individual devices and (3) communication overload from sending large model parameters for collaboration. To address these challenges, this paper propose Federated Prompt Learning for Weather Foundation Models on Devices (FedPoD), which enables devices to obtain highly customized models while maintaining communication efficiency. Concretely, our Adaptive Prompt Tuning leverages lightweight prompts guide frozen foundation model to generate more precise predictions, also conducts prompt-based multi-level communication to encourage multi-source knowledge fusion and regulate optimization. Additionally, Dynamic Graph Modeling constructs graphs from prompts, prioritizing collaborative training among devices with similar data distributions to against heterogeneity. Extensive experiments demonstrates FedPoD leads the performance among state-of-the-art baselines across various setting in real-world on-device weather forecasting datasets.
LGFeb 13, 2022
On the Convergence of Clustered Federated LearningJie Ma, Guodong Long, Tianyi Zhou et al.
Knowledge sharing and model personalization are essential components to tackle the non-IID challenge in federated learning (FL). Most existing FL methods focus on two extremes: 1) to learn a shared model to serve all clients with non-IID data, and 2) to learn personalized models for each client, namely personalized FL. There is a trade-off solution, namely clustered FL or cluster-wise personalized FL, which aims to cluster similar clients into one cluster, and then learn a shared model for all clients within a cluster. This paper is to revisit the research of clustered FL by formulating them into a bi-level optimization framework that could unify existing methods. We propose a new theoretical analysis framework to prove the convergence by considering the clusterability among clients. In addition, we embody this framework in an algorithm, named Weighted Clustered Federated Learning (WeCFL). Empirical analysis verifies the theoretical results and demonstrates the effectiveness of the proposed WeCFL under the proposed cluster-wise non-IID settings.
CLSep 21, 2021
Generalization in Text-based Games via Hierarchical Reinforcement LearningYunqiu Xu, Meng Fang, Ling Chen et al.
Deep reinforcement learning provides a promising approach for text-based games in studying natural language communication between humans and artificial agents. However, the generalization still remains a big challenge as the agents depend critically on the complexity and variety of training tasks. In this paper, we address this problem by introducing a hierarchical framework built upon the knowledge graph-based RL agent. In the high level, a meta-policy is executed to decompose the whole game into a set of subtasks specified by textual goals, and select one of them based on the KG. Then a sub-policy in the low level is executed to conduct goal-conditioned reinforcement learning. We carry out experiments on games with various difficulty levels and show that the proposed method enjoys favorable generalizability.
DCAug 24, 2021
Federated Learning for Open BankingGuodong Long, Yue Tan, Jing Jiang et al.
Open banking enables individual customers to own their banking data, which provides fundamental support for the boosting of a new ecosystem of data marketplaces and financial services. In the near future, it is foreseeable to have decentralized data ownership in the finance sector using federated learning. This is a just-in-time technology that can learn intelligent models in a decentralized training manner. The most attractive aspect of federated learning is its ability to decompose model training into a centralized server and distributed nodes without collecting private data. This kind of decomposed learning framework has great potential to protect users' privacy and sensitive data. Therefore, federated learning combines naturally with an open banking data marketplaces. This chapter will discuss the possible challenges for applying federated learning in the context of open banking, and the corresponding solutions have been explored as well.
AIJul 20, 2021
MIPO: Mutual Integration of Patient Journey and Medical Ontology for Healthcare Representation LearningXueping Peng, Guodong Long, Tao Shen et al.
Representation learning on electronic health records (EHRs) plays a vital role in downstream medical prediction tasks. Although natural language processing techniques, such as recurrent neural networks, and self-attention, have been adapted for learning medical representations from hierarchical, time-stamped EHR data, they often struggle when either general or task-specific data are limited. Recent efforts have attempted to mitigate this challenge by incorporating medical ontologies (i.e., knowledge graphs) into self-supervised tasks like diagnosis prediction. However, two main issues remain: (1) small and uniform ontologies that lack diversity for robust learning, and (2) insufficient attention to the critical contexts or dependencies underlying patient journeys, which could further enhance ontology-based learning. To address these gaps, we propose MIPO (Mutual Integration of Patient Journey and Medical Ontology), a robust end-to-end framework that employs a Transformer-based architecture for representation learning. MIPO emphasizes task-specific representation learning through a sequential diagnosis prediction task, while also incorporating an ontology-based disease-typing task. A graph-embedding module is introduced to integrate information from patient visit records, thus alleviating data insufficiency. This setup creates a mutually reinforcing loop, where both patient-journey embedding and ontology embedding benefit from each other. We validate MIPO on two real-world benchmark datasets, showing that it consistently outperforms baseline methods under both sufficient and limited data conditions. Furthermore, the resulting diagnosis embeddings offer improved interpretability, underscoring the promise of MIPO for real-world healthcare applications.
LGJul 10, 2021
Beyond Low-pass Filtering: Graph Convolutional Networks with Automatic FilteringZonghan Wu, Shirui Pan, Guodong Long et al.
Graph convolutional networks are becoming indispensable for deep learning from graph-structured data. Most of the existing graph convolutional networks share two big shortcomings. First, they are essentially low-pass filters, thus the potentially useful middle and high frequency band of graph signals are ignored. Second, the bandwidth of existing graph convolutional filters is fixed. Parameters of a graph convolutional filter only transform the graph inputs without changing the curvature of a graph convolutional filter function. In reality, we are uncertain about whether we should retain or cut off the frequency at a certain point unless we have expert domain knowledge. In this paper, we propose Automatic Graph Convolutional Networks (AutoGCN) to capture the full spectrum of graph signals and automatically update the bandwidth of graph convolutional filters. While it is based on graph spectral theory, our AutoGCN is also localized in space and has a spatial form. Experimental results show that AutoGCN achieves significant improvement over baseline methods which only work as low-pass filters.
LGMay 1, 2021
FedProto: Federated Prototype Learning across Heterogeneous ClientsYue Tan, Guodong Long, Lu Liu et al.
Heterogeneity across clients in federated learning (FL) usually hinders the optimization convergence and generalization performance when the aggregation of clients' knowledge occurs in the gradient space. For example, clients may differ in terms of data distribution, network latency, input/output space, and/or model architecture, which can easily lead to the misalignment of their local gradients. To improve the tolerance to heterogeneity, we propose a novel federated prototype learning (FedProto) framework in which the clients and server communicate the abstract class prototypes instead of the gradients. FedProto aggregates the local prototypes collected from different clients, and then sends the global prototypes back to all clients to regularize the training of local models. The training on each client aims to minimize the classification error on the local data while keeping the resulting local prototypes sufficiently close to the corresponding global ones. Moreover, we provide a theoretical analysis to the convergence rate of FedProto under non-convex objectives. In experiments, we propose a benchmark setting tailored for heterogeneous FL, with FedProto outperforming several recent FL approaches on multiple datasets.
CVFeb 3, 2021
Isometric Propagation Network for Generalized Zero-shot LearningLu Liu, Tianyi Zhou, Guodong Long et al.
Zero-shot learning (ZSL) aims to classify images of an unseen class only based on a few attributes describing that class but no access to any training sample. A popular strategy is to learn a mapping between the semantic space of class attributes and the visual space of images based on the seen classes and their data. Thus, an unseen class image can be ideally mapped to its corresponding class attributes. The key challenge is how to align the representations in the two spaces. For most ZSL settings, the attributes for each seen/unseen class are only represented by a vector while the seen-class data provide much more information. Thus, the imbalanced supervision from the semantic and the visual space can make the learned mapping easily overfitting to the seen classes. To resolve this problem, we propose Isometric Propagation Network (IPN), which learns to strengthen the relation between classes within each space and align the class dependency in the two spaces. Specifically, IPN learns to propagate the class representations on an auto-generated graph within each space. In contrast to only aligning the resulted static representation, we regularize the two dynamic propagation procedures to be isometric in terms of the two graphs' edge weights per step by minimizing a consistency loss between them. IPN achieves state-of-the-art performance on three popular ZSL benchmarks. To evaluate the generalization capability of IPN, we further build two larger benchmarks with more diverse unseen classes and demonstrate the advantages of IPN on them.
LGDec 19, 2020
Towards Coarse and Fine-grained Multi-Graph Multi-Label LearningYejiang Wang, Yuhai Zhao, Zhengkui Wang et al.
Multi-graph multi-label learning (\textsc{Mgml}) is a supervised learning framework, which aims to learn a multi-label classifier from a set of labeled bags each containing a number of graphs. Prior techniques on the \textsc{Mgml} are developed based on transfering graphs into instances and focus on learning the unseen labels only at the bag level. In this paper, we propose a \textit{coarse} and \textit{fine-grained} Multi-graph Multi-label (cfMGML) learning framework which directly builds the learning model over the graphs and empowers the label prediction at both the \textit{coarse} (aka. bag) level and \textit{fine-grained} (aka. graph in each bag) level. In particular, given a set of labeled multi-graph bags, we design the scoring functions at both graph and bag levels to model the relevance between the label and data using specific graph kernels. Meanwhile, we propose a thresholding rank-loss objective function to rank the labels for the graphs and bags and minimize the hamming-loss simultaneously at one-step, which aims to addresses the error accumulation issue in traditional rank-loss algorithms. To tackle the non-convex optimization problem, we further develop an effective sub-gradient descent algorithm to handle high-dimensional space computation required in cfMGML. Experiments over various real-world datasets demonstrate cfMGML achieves superior performance than the state-of-arts algorithms.
LGNov 2, 2020
Cooperative Heterogeneous Deep Reinforcement LearningHan Zheng, Pengfei Wei, Jing Jiang et al.
Numerous deep reinforcement learning agents have been proposed, and each of them has its strengths and flaws. In this work, we present a Cooperative Heterogeneous Deep Reinforcement Learning (CHDRL) framework that can learn a policy by integrating the advantages of heterogeneous agents. Specifically, we propose a cooperative learning framework that classifies heterogeneous agents into two classes: global agents and local agents. Global agents are off-policy agents that can utilize experiences from the other agents. Local agents are either on-policy agents or population-based evolutionary algorithms (EAs) agents that can explore the local area effectively. We employ global agents, which are sample-efficient, to guide the learning of local agents so that local agents can benefit from sample-efficient agents and simultaneously maintain their advantages, e.g., stability. Global agents also benefit from effective local searches. Experimental studies on a range of continuous control tasks from the Mujoco benchmark show that CHDRL achieves better performance compared with state-of-the-art baselines.