Yuwen Yang

LG
h-index9
14papers
391citations
Novelty52%
AI Score47

14 Papers

CLJun 2
AutoTail-BSFGM: Class-Balance-Aware Fine-Tuning for Chinese Scholarly Text Classification

Anling Xiang, Yuwen Yang, Yang Shen

Scholarly text classification supports literature organization, subject indexing, and research intelligence, but Chinese scholarly corpora often contain imbalanced and semantically adjacent disciplinary labels. We propose AutoTail-BSFGM, a class-balance-aware fine-tuning method that combines an automatically gated tail-prior adjustment, a weak Balanced Softmax auxiliary loss, and Fast Gradient Method adversarial regularization. The method changes only the training objective and procedure; inference uses the same single base-size encoder and linear classifier as the corresponding label-smoothed baseline. We evaluate the method on two CSL-based tasks: an abstract-to-discipline task with 67 labels and a title-to-category task with 13 categories. On the primary abstract task, AutoTail-BSFGM improves validation and lockbox accuracy under both Chinese RoBERTa-WWM and MacBERT-base. With MacBERT-base, validation accuracy increases by 0.83 percentage points and lockbox accuracy by 0.49 points, with a pooled paired McNemar signal on validation (p = 0.023). On the title task, the method improves validation accuracy by 0.70 points and validation balanced accuracy by 2.64 points; lockbox accuracy is approximately neutral while lockbox balanced accuracy improves by 1.22 points. The results support a bounded contribution: AutoTail-BSFGM improves class-balance-sensitive behavior and yields consistent gains for abstract-based scholarly classification, without uniformly improving every metric on every split.

AIMay 28
Double-Edged Sword or Sharp Tool? Designing and Evaluating Triadic LLM-Teacher Collaboration for K-12 Writing at Scale

Canran Wang, Yuwen Yang, Zhen Wang et al.

The double-edged sword of integrating Large Language Models (LLMs) requires an effective triadic collaboration mechanism among LLMs, teachers and students, especially for K-12 education. By developing a triadic collaboration system to support K-12 writing learning, a multidimensional evaluation framework grounded in Systemic Functional Linguistics and the suggestion trajectory tracing pipeline, this paper contributes a large-scale empirical dataset involving $57,954$ essays from $10,195$ students across $120$ schools over two years. Our findings confirm the efficacy of this system in improving writing quality through a strategic labor division: the LLM serves as a generative engine to mitigate teacher burnout, and the teacher acts as a pedagogical gatekeeper and bridge to guarantee feedback quality. While both LLM and teacher are critical for skill improvement, we uncover a ceiling effect where excessive linguistic expansion yields diminishing marginal utility. These suggest a dynamically adaptive LLM-teacher collaboration as student proficiency increases.

CVNov 22, 2023
FedHCA$^2$: Towards Hetero-Client Federated Multi-Task Learning

Yuxiang Lu, Suizhi Huang, Yuwen Yang et al.

Federated Learning (FL) enables joint training across distributed clients using their local data privately. Federated Multi-Task Learning (FMTL) builds on FL to handle multiple tasks, assuming model congruity that identical model architecture is deployed in each client. To relax this assumption and thus extend real-world applicability, we introduce a novel problem setting, Hetero-Client Federated Multi-Task Learning (HC-FMTL), to accommodate diverse task setups. The main challenge of HC-FMTL is the model incongruity issue that invalidates conventional aggregation methods. It also escalates the difficulties in accurate model aggregation to deal with data and task heterogeneity inherent in FMTL. To address these challenges, we propose the FedHCA$^2$ framework, which allows for federated training of personalized models by modeling relationships among heterogeneous clients. Drawing on our theoretical insights into the difference between multi-task and federated optimization, we propose the Hyper Conflict-Averse Aggregation scheme to mitigate conflicts during encoder updates. Additionally, inspired by task interaction in MTL, the Hyper Cross Attention Aggregation scheme uses layer-wise cross attention to enhance decoder interactions while alleviating model incongruity. Moreover, we employ learnable Hyper Aggregation Weights for each client to customize personalized parameter updates. Extensive experiments demonstrate the superior performance of FedHCA$^2$ in various HC-FMTL scenarios compared to representative methods. Our code will be made publicly available.

LGOct 28, 2022
Completely Heterogeneous Federated Learning

Chang Liu, Yuwen Yang, Xun Cai et al.

Federated learning (FL) faces three major difficulties: cross-domain, heterogeneous models, and non-i.i.d. labels scenarios. Existing FL methods fail to handle the above three constraints at the same time, and the level of privacy protection needs to be lowered (e.g., the model architecture and data category distribution can be shared). In this work, we propose the challenging "completely heterogeneous" scenario in FL, which refers to that each client will not expose any private information including feature space, model architecture, and label distribution. We then devise an FL framework based on parameter decoupling and data-free knowledge distillation to solve the problem. Experiments show that our proposed method achieves high performance in completely heterogeneous scenarios where other approaches fail.

LGSep 16, 2023
UNIDEAL: Curriculum Knowledge Distillation Federated Learning

Yuwen Yang, Chang Liu, Xun Cai et al.

Federated Learning (FL) has emerged as a promising approach to enable collaborative learning among multiple clients while preserving data privacy. However, cross-domain FL tasks, where clients possess data from different domains or distributions, remain a challenging problem due to the inherent heterogeneity. In this paper, we present UNIDEAL, a novel FL algorithm specifically designed to tackle the challenges of cross-domain scenarios and heterogeneous model architectures. The proposed method introduces Adjustable Teacher-Student Mutual Evaluation Curriculum Learning, which significantly enhances the effectiveness of knowledge distillation in FL settings. We conduct extensive experiments on various datasets, comparing UNIDEAL with state-of-the-art baselines. Our results demonstrate that UNIDEAL achieves superior performance in terms of both model accuracy and communication efficiency. Additionally, we provide a convergence analysis of the algorithm, showing a convergence rate of O(1/T) under non-convex conditions.

LGNov 1, 2022
Position-Aware Subgraph Neural Networks with Data-Efficient Learning

Chang Liu, Yuwen Yang, Zhe Xie et al.

Data-efficient learning on graphs (GEL) is essential in real-world applications. Existing GEL methods focus on learning useful representations for nodes, edges, or entire graphs with ``small'' labeled data. But the problem of data-efficient learning for subgraph prediction has not been explored. The challenges of this problem lie in the following aspects: 1) It is crucial for subgraphs to learn positional features to acquire structural information in the base graph in which they exist. Although the existing subgraph neural network method is capable of learning disentangled position encodings, the overall computational complexity is very high. 2) Prevailing graph augmentation methods for GEL, including rule-based, sample-based, adaptive, and automated methods, are not suitable for augmenting subgraphs because a subgraph contains fewer nodes but richer information such as position, neighbor, and structure. Subgraph augmentation is more susceptible to undesirable perturbations. 3) Only a small number of nodes in the base graph are contained in subgraphs, which leads to a potential ``bias'' problem that the subgraph representation learning is dominated by these ``hot'' nodes. By contrast, the remaining nodes fail to be fully learned, which reduces the generalization ability of subgraph representation learning. In this paper, we aim to address the challenges above and propose a Position-Aware Data-Efficient Learning framework for subgraph neural networks called PADEL. Specifically, we propose a novel node position encoding method that is anchor-free, and design a new generative subgraph augmentation method based on a diffused variational subgraph autoencoder, and we propose exploratory and exploitable views for subgraph contrastive learning. Extensive experiment results on three real-world datasets show the superiority of our proposed method over state-of-the-art baselines.

LGFeb 20, 2024Code
Federated Multi-Task Learning on Non-IID Data Silos: An Experimental Study

Yuwen Yang, Yuxiang Lu, Suizhi Huang et al.

The innovative Federated Multi-Task Learning (FMTL) approach consolidates the benefits of Federated Learning (FL) and Multi-Task Learning (MTL), enabling collaborative model training on multi-task learning datasets. However, a comprehensive evaluation method, integrating the unique features of both FL and MTL, is currently absent in the field. This paper fills this void by introducing a novel framework, FMTL-Bench, for systematic evaluation of the FMTL paradigm. This benchmark covers various aspects at the data, model, and optimization algorithm levels, and comprises seven sets of comparative experiments, encapsulating a wide array of non-independent and identically distributed (Non-IID) data partitioning scenarios. We propose a systematic process for comparing baselines of diverse indicators and conduct a case study on communication expenditure, time, and energy consumption. Through our exhaustive experiments, we aim to provide valuable insights into the strengths and limitations of existing baseline methods, contributing to the ongoing discourse on optimal FMTL application in practical scenarios. The source code can be found on https://github.com/youngfish42/FMTL-Benchmark .

LGOct 13, 2022
NoMorelization: Building Normalizer-Free Models from a Sample's Perspective

Chang Liu, Yuwen Yang, Yue Ding et al.

The normalizing layer has become one of the basic configurations of deep learning models, but it still suffers from computational inefficiency, interpretability difficulties, and low generality. After gaining a deeper understanding of the recent normalization and normalizer-free research works from a sample's perspective, we reveal the fact that the problem lies in the sampling noise and the inappropriate prior assumption. In this paper, we propose a simple and effective alternative to normalization, which is called "NoMorelization". NoMorelization is composed of two trainable scalars and a zero-centered noise injector. Experimental results demonstrate that NoMorelization is a general component for deep learning and is suitable for different model paradigms (e.g., convolution-based and attention-based models) to tackle different tasks (e.g., discriminative and generative tasks). Compared with existing mainstream normalizers (e.g., BN, LN, and IN) and state-of-the-art normalizer-free methods, NoMorelization shows the best speed-accuracy trade-off.

LGNov 19, 2022
EDEN: A Plug-in Equivariant Distance Encoding to Beyond the 1-WL Test

Chang Liu, Yuwen Yang, Yue Ding et al.

The message-passing scheme is the core of graph representation learning. While most existing message-passing graph neural networks (MPNNs) are permutation-invariant in graph-level representation learning and permutation-equivariant in node- and edge-level representation learning, their expressive power is commonly limited by the 1-Weisfeiler-Lehman (1-WL) graph isomorphism test. Recently proposed expressive graph neural networks (GNNs) with specially designed complex message-passing mechanisms are not practical. To bridge the gap, we propose a plug-in Equivariant Distance ENcoding (EDEN) for MPNNs. EDEN is derived from a series of interpretable transformations on the graph's distance matrix. We theoretically prove that EDEN is permutation-equivariant for all level graph representation learning, and we empirically illustrate that EDEN's expressive power can reach up to the 3-WL test. Extensive experiments on real-world datasets show that combining EDEN with conventional GNNs surpasses recent advanced GNNs.

DCJun 3, 2024
An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing

Hang Dong, Liwen Zhu, Zhao Shan et al.

Efficient resource utilization and perfect user experience usually conflict with each other in cloud computing platforms. Great efforts have been invested in increasing resource utilization but trying not to affect users' experience for cloud computing platforms. In order to better utilize the remaining pieces of computing resources spread over the whole platform, deferrable jobs are provided with a discounted price to users. For this type of deferrable jobs, users are allowed to submit jobs that will run for a specific uninterrupted duration in a flexible range of time in the future with a great discount. With these deferrable jobs to be scheduled under the remaining capacity after deploying those on-demand jobs, it remains a challenge to achieve high resource utilization and meanwhile shorten the waiting time for users as much as possible in an online manner. In this paper, we propose an online deferrable job scheduling method called \textit{Online Scheduling for DEferrable jobs in Cloud} (\OSDEC{}), where a deep reinforcement learning model is adopted to learn the scheduling policy, and several auxiliary tasks are utilized to provide better state representations and improve the performance of the model. With the integrated reinforcement learning framework, the proposed method can well plan the deployment schedule and achieve a short waiting time for users while maintaining a high resource utilization for the platform. The proposed method is validated on a public dataset and shows superior performance.

LGJul 7, 2020
Learning Combined Set Covering and Traveling Salesman Problem

Yuwen Yang, Jayant Rajgopal

The Traveling Salesman Problem is one of the most intensively studied combinatorial optimization problems due both to its range of real-world applications and its computational complexity. When combined with the Set Covering Problem, it raises even more issues related to tractability and scalability. We study a combined Set Covering and Traveling Salesman problem and provide a mixed integer programming formulation to solve the problem. Motivated by applications where the optimal policy needs to be updated on a regular basis and repetitively solving this via MIP can be computationally expensive, we propose a machine learning approach to effectively deal with this problem by providing an opportunity to learn from historical optimal solutions that are derived from the MIP formulation. We also present a case study using the vaccine distribution chain of the World Health Organization, and provide numerical results with data derived from four countries in sub-Saharan Africa.

ITDec 27, 2019
Deep Transfer Learning Based Downlink Channel Prediction for FDD Massive MIMO Systems

Yuwen Yang, Feifei Gao, Zhimeng Zhong et al.

Artificial intelligence (AI) based downlink channel state information (CSI) prediction for frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems has attracted growing attention recently. However, existing works focus on the downlink CSI prediction for the users under a given environment and is hard to adapt to users in new environment especially when labeled data is limited. To address this issue, we formulate the downlink channel prediction as a deep transfer learning (DTL) problem, where each learning task aims to predict the downlink CSI from the uplink CSI for one single environment. Specifically, we develop the direct-transfer algorithm based on the fully-connected neural network architecture, where the network is trained on the data from all previous environments in the manner of classical deep learning and is then fine-tuned for new environments. To further improve the transfer efficiency, we propose the meta-learning algorithm that trains the network by alternating inner-task and across-task updates and then adapts to a new environment with a small number of labeled data. Simulation results show that the direct-transfer algorithm achieves better performance than the deep learning algorithm, which implies that the transfer learning benefits the downlink channel prediction in new environments. Moreover, the meta-learning algorithm significantly outperforms the direct-transfer algorithm in terms of both prediction accuracy and stability, which validates its effectiveness and superiority.

ITSep 29, 2019
Model-aided Deep Neural Network for Source Number Detection

Yuwen Yang, Feifei Gao, Cheng Qian et al.

Source number detection is a critical problem in array signal processing. Conventional model-driven methods e.g., Akaikes information criterion (AIC) and minimum description length (MDL), suffer from severe performance degradation when the number of snapshots is small or the signal-to-noise ratio (SNR) is low. In this paper, we exploit the model-aided based deep neural network (DNN) to estimate the source number. Specifically, we first propose the eigenvalue based regression network (ERNet) and classification network (ECNet) to estimate the number of non-coherent sources, where the eigenvalues of the received signal covariance matrix and the source number are used as the input and the supervise label of the networks, respectively. Then, we extend the ERNet and ECNet for estimating the number of coherent sources, where the forward-backward spatial smoothing (FBSS) scheme is adopted to improve the performance of ERNet and ECNet. Numerical results demonstrate the outstanding performance of ERNet and ECNet over the conventional AIC and MDL methods as well as their excellent generalization capability, which also shows their great potentials for practical applications.

SOC-PHJul 25, 2019
Optimizing vaccine distribution networks in low and middle-income countries

Yuwen Yang, Hoda Bidkhori, Jayant Rajgopal

Vaccination has been proven to be the most effective method to prevent infectious diseases. However, there are still millions of children in low and middle-income countries who are not covered by routine vaccines and remain at risk. The World Health Organization - Expanded Programme on Immunization (WHO-EPI) was designed to provide universal childhood vaccine access for children across the world and in this work, we address the design of the distribution network for WHO-EPI vaccines. In particular, we formulate the network design problem as a mixed integer program (MIP) and present a new algorithm for typical problems that are too large to be solved using commercial MIP software. We test the algorithm using data derived from four different countries in sub-Saharan Africa and show that the algorithm is able to obtain high-quality solutions for even the largest problems within a few minutes.