Qunwei Li

h-index22

18papers

331citations

Novelty56%

AI Score41

Ranked #91,273 of 201,018 authors (top 45%)#20,497 in LG (top 48%)

18 Papers

LGApr 12, 2023

Edge-cloud Collaborative Learning with Federated and Centralized Features

Zexi Li, Qunwei Li, Yi Zhou et al.

Federated learning (FL) is a popular way of edge computing that doesn't compromise users' privacy. Current FL paradigms assume that data only resides on the edge, while cloud servers only perform model averaging. However, in real-life situations such as recommender systems, the cloud server has the ability to store historical and interactive features. In this paper, our proposed Edge-Cloud Collaborative Knowledge Transfer Framework (ECCT) bridges the gap between the edge and cloud, enabling bi-directional knowledge transfer between both, sharing feature embeddings and prediction logits. ECCT consolidates various benefits, including enhancing personalization, enabling model heterogeneity, tolerating training asynchronization, and relieving communication burdens. Extensive experiments on public and industrial datasets demonstrate ECCT's effectiveness and potential for use in academia and industry.

LGSep 12, 2025Code

FedBiF: Communication-Efficient Federated Learning via Bits Freezing

Shiwei Li, Qunwei Li, Haozhao Wang et al.

Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative model training without sharing local data. Despite its advantages, FL suffers from substantial communication overhead, which can affect training efficiency. Recent efforts have mitigated this issue by quantizing model updates to reduce communication costs. However, most existing methods apply quantization only after local training, introducing quantization errors into the trained parameters and potentially degrading model accuracy. In this paper, we propose Federated Bit Freezing (FedBiF), a novel FL framework that directly learns quantized model parameters during local training. In each communication round, the server first quantizes the model parameters and transmits them to the clients. FedBiF then allows each client to update only a single bit of the multi-bit parameter representation, freezing the remaining bits. This bit-by-bit update strategy reduces each parameter update to one bit while maintaining high precision in parameter representation. Extensive experiments are conducted on five widely used datasets under both IID and Non-IID settings. The results demonstrate that FedBiF not only achieves superior communication compression but also promotes sparsity in the resulting models. Notably, FedBiF attains accuracy comparable to FedAvg, even when using only 1 bit-per-parameter (bpp) for uplink and 3 bpp for downlink communication. The code is available at https://github.com/Leopold1423/fedbif-tpds25.

LGMar 9, 2024

Towards Efficient Replay in Federated Incremental Learning

Yichen Li, Qunwei Li, Haozhao Wang et al.

In Federated Learning (FL), the data in each client is typically assumed fixed or static. However, data often comes in an incremental manner in real-world applications, where the data domain may increase dynamically. In this work, we study catastrophic forgetting with data heterogeneity in Federated Incremental Learning (FIL) scenarios where edge clients may lack enough storage space to retain full data. We propose to employ a simple, generic framework for FIL named Re-Fed, which can coordinate each client to cache important samples for replay. More specifically, when a new task arrives, each client first caches selected previous samples based on their global and local importance. Then, the client trains the local model with both the cached samples and the samples from the new task. Theoretically, we analyze the ability of Re-Fed to discover important samples for replay thus alleviating the catastrophic forgetting problem. Moreover, we empirically show that Re-Fed achieves competitive performance compared to state-of-the-art methods.

IRFeb 5, 2024

Denoising Time Cycle Modeling for Recommendation

Sicong Xie, Qunwei Li, Weidi Xu et al.

Recently, modeling temporal patterns of user-item interactions have attracted much attention in recommender systems. We argue that existing methods ignore the variety of temporal patterns of user behaviors. We define the subset of user behaviors that are irrelevant to the target item as noises, which limits the performance of target-related time cycle modeling and affect the recommendation performance. In this paper, we propose Denoising Time Cycle Modeling (DiCycle), a novel approach to denoise user behaviors and select the subset of user behaviors that are highly related to the target item. DiCycle is able to explicitly model diverse time cycle patterns for recommendation. Extensive experiments are conducted on both public benchmarks and a real-world dataset, demonstrating the superior performance of DiCycle over the state-of-the-art recommendation methods.

IRApr 5, 2025

Towards Principled Learning for Re-ranking in Recommender Systems

Qunwei Li, Linghui Li, Jianbin Lin et al.

As the final stage of recommender systems, re-ranking presents ordered item lists to users that best match their interests. It plays such a critical role and has become a trending research topic with much attention from both academia and industry. Recent advances of re-ranking are focused on attentive listwise modeling of interactions and mutual influences among items to be re-ranked. However, principles to guide the learning process of a re-ranker, and to measure the quality of the output of the re-ranker, have been always missing. In this paper, we study such principles to learn a good re-ranker. Two principles are proposed, including convergence consistency and adversarial consistency. These two principles can be applied in the learning of a generic re-ranker and improve its performance. We validate such a finding by various baseline methods over different datasets.

OCOct 14, 2021

A Cubic Regularization Approach for Finding Local Minimax Points in Nonconvex Minimax Optimization

Ziyi Chen, Zhengyang Hu, Qunwei Li et al.

Gradient descent-ascent (GDA) is a widely used algorithm for minimax optimization. However, GDA has been proved to converge to stationary points for nonconvex minimax optimization, which are suboptimal compared with local minimax points. In this work, we develop cubic regularization (CR) type algorithms that globally converge to local minimax points in nonconvex-strongly-concave minimax optimization. We first show that local minimax points are equivalent to second-order stationary points of a certain envelope function. Then, inspired by the classic cubic regularization algorithm, we propose an algorithm named Cubic-LocalMinimax for finding local minimax points, and provide a comprehensive convergence analysis by leveraging its intrinsic potential function. Specifically, we establish the global convergence of Cubic-LocalMinimax to a local minimax point at a sublinear convergence rate and characterize its iteration complexity. Also, we propose a GDA-based solver for solving the cubic subproblem involved in Cubic-LocalMinimax up to certain pre-defined accuracy, and analyze the overall gradient and Hessian-vector product computation complexities of such an inexact Cubic-LocalMinimax algorithm. Moreover, we propose a stochastic variant of Cubic-LocalMinimax for large-scale minimax optimization, and characterize its sample complexity under stochastic sub-sampling. Experimental results demonstrate faster convergence of our stochastic Cubic-LocalMinimax than some existing algorithms.

LGDec 7, 2020

Learning Graph Neural Networks with Approximate Gradient Descent

Qunwei Li, Shaofeng Zou, Wenliang Zhong

The first provably efficient algorithm for learning graph neural networks (GNNs) with one hidden layer for node information convolution is provided in this paper. Two types of GNNs are investigated, depending on whether labels are attached to nodes or graphs. A comprehensive framework for designing and analyzing convergence of GNN training algorithms is developed. The algorithm proposed is applicable to a wide range of activation functions including ReLU, Leaky ReLU, Sigmod, Softplus and Swish. It is shown that the proposed algorithm guarantees a linear convergence rate to the underlying true parameters of GNNs. For both types of GNNs, sample complexity in terms of the number of nodes or the number of graphs is characterized. The impact of feature dimension and GNN structure on the convergence rate is also theoretically characterized. Numerical experiments are further provided to validate our theoretical analysis.

HCSep 3, 2019

Prospect Theory Based Crowdsourcing for Classification in the Presence of Spammers

Baocheng Geng, Qunwei Li, Pramod K. Varshney

We consider the $M$-ary classification problem via crowdsourcing, where crowd workers respond to simple binary questions and the answers are aggregated via decision fusion. The workers have a reject option to skip answering a question when they do not have the expertise, or when the confidence of answering that question correctly is low. We further consider that there are spammers in the crowd who respond to the questions with random guesses. Under the payment mechanism that encourages the reject option, we study the behavior of honest workers and spammers, whose objectives are to maximize their monetary rewards. To accurately characterize human behavioral aspects, we employ prospect theory to model the rationality of the crowd workers, whose perception of costs and probabilities are distorted based on some value and weight functions, respectively. Moreover, we estimate the number of spammers and employ a weighted majority voting decision rule, where we assign an optimal weight for every worker to maximize the system performance. The probability of correct classification and asymptotic system performance are derived. We also provide simulation results to demonstrate the effectiveness of our approach.

LGJun 6, 2019

A Look at the Effect of Sample Design on Generalization through the Lens of Spectral Analysis

Bhavya Kailkhura, Jayaraman J. Thiagarajan, Qunwei Li et al.

This paper provides a general framework to study the effect of sampling properties of training data on the generalization error of the learned machine learning (ML) models. Specifically, we propose a new spectral analysis of the generalization error, expressed in terms of the power spectra of the sampling pattern and the function involved. The framework is build in the Euclidean space using Fourier analysis and establishes a connection between some high dimensional geometric objects and optimal spectral form of different state-of-the-art sampling patterns. Subsequently, we estimate the expected error bounds and convergence rate of different state-of-the-art sampling patterns, as the number of samples and dimensions increase. We make several observations about generalization error which are valid irrespective of the approximation scheme (or learning architecture) and training (or optimization) algorithms. Our result also sheds light on ways to formulate design principles for constructing optimal sampling methods for particular problems.

LGNov 22, 2018

MR-GAN: Manifold Regularized Generative Adversarial Networks

Qunwei Li, Bhavya Kailkhura, Rushil Anirudh et al.

Despite the growing interest in generative adversarial networks (GANs), training GANs remains a challenging problem, both from a theoretical and a practical standpoint. To address this challenge, in this paper, we propose a novel way to exploit the unique geometry of the real data, especially the manifold information. More specifically, we design a method to regularize GAN training by adding an additional regularization term referred to as manifold regularizer. The manifold regularizer forces the generator to respect the unique geometry of the real data manifold and generate high quality data. Furthermore, we theoretically prove that the addition of this regularization term in any class of GANs including DCGAN and Wasserstein GAN leads to improved performance in terms of generalization, existence of equilibrium, and stability. Preliminary experiments show that the proposed manifold regularization helps in avoiding mode collapse and leads to stable training.

LGJul 31, 2018

K-medoids Clustering of Data Sequences with Composite Distributions

Tiexing Wang, Qunwei Li, Donald J. Bucci et al.

This paper studies clustering of data sequences using the k-medoids algorithm. All the data sequences are assumed to be generated from \emph{unknown} continuous distributions, which form clusters with each cluster containing a composite set of closely located distributions (based on a certain distance metric between distributions). The maximum intra-cluster distance is assumed to be smaller than the minimum inter-cluster distance, and both values are assumed to be known. The goal is to group the data sequences together if their underlying generative distributions (which are unknown) belong to one cluster. Distribution distance metrics based k-medoids algorithms are proposed for known and unknown number of distribution clusters. Upper bounds on the error probability and convergence results in the large sample regime are also provided. It is shown that the error probability decays exponentially fast as the number of samples in each data sequence goes to infinity. The error exponent has a simple form regardless of the distance metric applied when certain conditions are satisfied. In particular, the error exponent is characterized when either the Kolmogrov-Smirnov distance or the maximum mean discrepancy are used as the distance metric. Simulation results are provided to validate the analysis.

LGMay 1, 2018

Decision Tree Design for Classification in Crowdsourcing Systems

Baocheng Geng, Qunwei Li, Pramod K. Varshney

In this paper, we present a novel sequential paradigm for classification in crowdsourcing systems. Considering that workers are unreliable and they perform the tests with errors, we study the construction of decision trees so as to minimize the probability of mis-classification. By exploiting the connection between probability of mis-classification and entropy at each level of the decision tree, we propose two algorithms for decision tree design. Furthermore, the worker assignment problem is studied when workers can be assigned to different tests of the decision tree to provide a trade-off between classification cost and resulting error performance. Numerical results are presented for illustration.

HCOct 26, 2017

Optimal Crowdsourced Classification with a Reject Option in the Presence of Spammers

Qunwei Li, Pramod K. Varshney

LGOct 14, 2017

Robust Decentralized Learning Using ADMM with Unreliable Agents

Qunwei Li, Bhavya Kailkhura, Ryan Goldhahn et al.

Many machine learning problems can be formulated as consensus optimization problems which can be solved efficiently via a cooperative multi-agent system. However, the agents in the system can be unreliable due to a variety of reasons: noise, faults and attacks. Providing erroneous updates leads the optimization process in a wrong direction, and degrades the performance of distributed machine learning algorithms. This paper considers the problem of decentralized learning using ADMM in the presence of unreliable agents. First, we rigorously analyze the effect of erroneous updates (in ADMM learning iterations) on the convergence behavior of multi-agent system. We show that the algorithm linearly converges to a neighborhood of the optimal solution under certain conditions and characterize the neighborhood size analytically. Next, we provide guidelines for network design to achieve a faster convergence. We also provide conditions on the erroneous updates for exact convergence to the optimal solution. Finally, to mitigate the influence of unreliable agents, we propose \textsf{ROAD}, a robust variant of ADMM, and show its resilience to unreliable agents with an exact convergence to the optimum.

LGMay 14, 2017

Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization

Qunwei Li, Yi Zhou, Yingbin Liang et al.

In many modern machine learning applications, structures of underlying mathematical models often yield nonconvex optimization problems. Due to the intractability of nonconvexity, there is a rising need to develop efficient methods for solving general nonconvex problems with certain performance guarantee. In this work, we investigate the accelerated proximal gradient method for nonconvex programming (APGnc). The method compares between a usual proximal gradient step and a linear extrapolation step, and accepts the one that has a lower function value to achieve a monotonic decrease. In specific, under a general nonsmooth and nonconvex setting, we provide a rigorous argument to show that the limit points of the sequence generated by APGnc are critical points of the objective function. Then, by exploiting the Kurdyka-Łojasiewicz (\KL) property for a broad class of functions, we establish the linear and sub-linear convergence rates of the function value sequence generated by APGnc. We further propose a stochastic variance reduced APGnc (SVRG-APGnc), and establish its linear convergence under a special case of the \KL property. We also extend the analysis to the inexact version of these methods and develop an adaptive momentum strategy that improves the numerical performance.

SIApr 3, 2017

Does Confidence Reporting from the Crowd Benefit Crowdsourcing Performance?

Qunwei Li, Pramod K. Varshney

We explore the design of an effective crowdsourcing system for an $M$-ary classification task. Crowd workers complete simple binary microtasks whose results are aggregated to give the final classification decision. We consider the scenario where the workers have a reject option so that they are allowed to skip microtasks when they are unable to or choose not to respond to binary microtasks. Additionally, the workers report quantized confidence levels when they are able to submit definitive answers. We present an aggregation approach using a weighted majority voting rule, where each worker's response is assigned an optimized weight to maximize crowd's classification performance. We obtain a couterintuitive result that the classification performance does not benefit from workers reporting quantized confidence. Therefore, the crowdsourcing system designer should employ the reject option without requiring confidence reporting.

SINov 30, 2016

Influential Node Detection in Implicit Social Networks using Multi-task Gaussian Copula Models

Qunwei Li, Bhavya Kailkhura, Jayaraman J. Thiagarajan et al.

Influential node detection is a central research topic in social network analysis. Many existing methods rely on the assumption that the network structure is completely known \textit{a priori}. However, in many applications, network structure is unavailable to explain the underlying information diffusion phenomenon. To address the challenge of information diffusion analysis with incomplete knowledge of network structure, we develop a multi-task low rank linear influence model. By exploiting the relationships between contagions, our approach can simultaneously predict the volume (i.e. time series prediction) for each contagion (or topic) and automatically identify the most influential nodes for each contagion. The proposed model is validated using synthetic data and an ISIS twitter dataset. In addition to improving the volume prediction performance significantly, we show that the proposed approach can reliably infer the most influential users for specific contagions.

LGFeb 1, 2016

Multi-object Classification via Crowdsourcing with a Reject Option

Qunwei Li, Aditya Vempaty, Lav R. Varshney et al.

Consider designing an effective crowdsourcing system for an $M$-ary classification task. Crowd workers complete simple binary microtasks whose results are aggregated to give the final result. We consider the novel scenario where workers have a reject option so they may skip microtasks when they are unable or choose not to respond. For example, in mismatched speech transcription, workers who do not know the language may not be able to respond to microtasks focused on phonological dimensions outside their categorical perception. We present an aggregation approach using a weighted majority voting rule, where each worker's response is assigned an optimized weight to maximize the crowd's classification performance. We evaluate system performance in both exact and asymptotic forms. Further, we consider the setting where there may be a set of greedy workers that complete microtasks even when they are unable to perform it reliably. We consider an oblivious and an expurgation strategy to deal with greedy workers, developing an algorithm to adaptively switch between the two based on the estimated fraction of greedy workers in the anonymous crowd. Simulation results show improved performance compared with conventional majority voting.