Weina Wang

LG
h-index5
16papers
245citations
Novelty60%
AI Score45

16 Papers

LGMay 21
Noise Schedule Design for Diffusion Models: An Optimal Control Perspective

Seo Taek Kong, Weina Wang, R. Srikant

We develop a principled framework for analyzing and designing noise schedules in diffusion models. We show that one can recast this design problem as an optimal control problem, whose state is the Fisher information of the diffusion process which evolves according to an ODE and the control input is the noise schedule. The objective of the optimal control problem is a functional involving the Fisher information, which is shown to be an upper bound on the Kullback-Leibler sampling error. By solving this optimal control problem, we obtain sufficient conditions on noise schedules under which state-of-the-art $\tilde{\mathcal{O}} (d/n)$ sampling error is achievable, where $d$ is the data dimension and $n$ is the number of discretization steps. While existing theoretical work also prove that $\tilde{\mathcal{O}}(d/n)$ sampling error bounds are achievable, these results hold for specific noise schedules, which do not include the schedules used in practice. Under a further parametric assumption on the data distribution, we show that one can obtain closed-form expressions for the noise schedules. These noise schedules generalize standard empirical schedules such as exponential and sigmoid schedules by allowing additional parameters that can be tuned. Systematically tuning the parameters of these schedules yields new schedules that achieve superior FID scores on image generation benchmarks.

LGFeb 2, 2024
Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

Neharika Jali, Guannan Qu, Weina Wang et al.

We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-trivial to find. While Reinforcement Learning (RL) has been recognized to have great potential for learning policies in such cases, our problem has an exponentially large state space size, rendering standard RL inefficient. In this work, we propose ACHQ, an efficient policy gradient based algorithm with a low dimensional soft threshold policy parameterization that leverages the underlying queueing structure. We provide stationary-point convergence guarantees for the general case and despite the low-dimensional parameterization prove that ACHQ converges to an approximate global optimum for the special case of two servers. Simulations demonstrate an improvement in expected response time of up to ~30% over the greedy policy that routes to the fastest available server.

LGFeb 8, 2024
Unichain and Aperiodicity are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

Yige Hong, Qiaomin Xie, Yudong Chen et al.

We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an $O(1/\sqrt{N})$ optimality gap for an $N$-armed problem, assuming only a unichain and aperiodicity assumption. Our approach departs from most existing work that focuses on index or priority policies, which rely on the Global Attractor Property (GAP) to guarantee convergence to the optimum, or a recently developed simulation-based policy, which requires a Synchronization Assumption (SA).

LGFeb 9, 2025
Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

Xiangcheng Zhang, Yige Hong, Weina Wang

Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, an efficiently computable policy achieves an $O(1/\sqrt{N})$ optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.

OCOct 19, 2024
Achieving $\tilde{\mathcal{O}}(1/N)$ Optimality Gap in Restless Bandits through Gaussian Approximation

Chen Yan, Weina Wang, Lei Ying

We study the finite-horizon Restless Multi-Armed Bandit (RMAB) problem with $N$ homogeneous arms. Prior work has shown that when an RMAB satisfies a non-degeneracy condition, Linear-Programming-based (LP-based) policies derived from the fluid approximation, which captures the mean dynamics of the system, achieve an exponentially small optimality gap. However, it is common for RMABs to be degenerate, in which case LP-based policies can result in a $Θ(1/\sqrt{N})$ optimality gap per arm. In this paper, we propose a novel Stochastic-Programming-based (SP-based) policy that, under a uniqueness assumption, achieves an $\tilde{\mathcal{O}}(1/N)$ optimality gap for degenerate RMABs. Our approach is based on the construction of a Gaussian stochastic system that captures not only the mean but also the variance of the RMAB dynamics, resulting in a more accurate approximation than the fluid approximation. We then solve a stochastic program for this system to obtain our policy. This is the first result to establish an $\tilde{\mathcal{O}}(1/N)$ optimality gap for degenerate RMABs.

LGMay 31, 2023
Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

Yige Hong, Qiaomin Xie, Yudong Chen et al.

We study the infinite-horizon restless bandit problem with the average reward criterion, in both discrete-time and continuous-time settings. A fundamental goal is to efficiently compute policies that achieve a diminishing optimality gap as the number of arms, $N$, grows large. Existing results on asymptotic optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this paper, we propose a general, simulation-based framework, Follow-the-Virtual-Advice, that converts any single-armed policy into a policy for the original $N$-armed problem. This is done by simulating the single-armed policy on each arm and carefully steering the real state towards the simulated state. Our framework can be instantiated to produce a policy with an $O(1/\sqrt{N})$ optimality gap. In the discrete-time setting, our result holds under a simpler synchronization assumption, which covers some problem instances that violate UGAP. More notably, in the continuous-time setting, we do not require \emph{any} additional assumptions beyond the standard unichain condition. In both settings, our work is the first asymptotic optimality result that does not require UGAP.

LGMay 25, 2023
Sample Efficient Reinforcement Learning in Mixed Systems through Augmented Samples and Its Applications to Queueing Networks

Honghao Wei, Xin Liu, Weina Wang et al.

This paper considers a class of reinforcement learning problems, which involve systems with two types of states: stochastic and pseudo-stochastic. In such systems, stochastic states follow a stochastic transition kernel while the transitions of pseudo-stochastic states are deterministic given the stochastic states/transitions. We refer to such systems as mixed systems, which are widely used in various applications, including manufacturing systems, communication networks, and queueing networks. We propose a sample efficient RL method that accelerates learning by generating augmented data samples. The proposed algorithm is data-driven and learns the policy from data samples from both real and augmented samples. This method significantly improves learning by reducing the sample complexity such that the dataset only needs to have sufficient coverage of the stochastic states. We analyze the sample complexity of the proposed method under Fitted Q Iteration (FQI) and demonstrate that the optimality gap decreases as $\tilde{\mathcal{O}}(\sqrt{{1}/{n}}+\sqrt{{1}/{m}}),$ where $n$ is the number of real samples and $m$ is the number of augmented samples per real sample. It is important to note that without augmented samples, the optimality gap is $\tilde{\mathcal{O}}(1)$ due to insufficient data coverage of the pseudo-stochastic states. Our experimental results on multiple queueing network applications confirm that the proposed method indeed significantly accelerates learning in both deep Q-learning and deep policy gradient.

CRJan 27, 2022
Calibration with Privacy in Peer Review

Wenxin Ding, Gautam Kamath, Weina Wang et al.

Reviewers in peer review are often miscalibrated: they may be strict, lenient, extreme, moderate, etc. A number of algorithms have previously been proposed to calibrate reviews. Such attempts of calibration can however leak sensitive information about which reviewer reviewed which paper. In this paper, we identify this problem of calibration with privacy, and provide a foundational building block to address it. Specifically, we present a theoretical study of this problem under a simplified-yet-challenging model involving two reviewers, two papers, and an MAP-computing adversary. Our main results establish the Pareto frontier of the tradeoff between privacy (preventing the adversary from inferring reviewer identity) and utility (accepting better papers), and design explicit computationally-efficient algorithms that we prove are Pareto optimal.

SYJun 8, 2021
Job Dispatching Policies for Queueing Systems with Unknown Service Rates

Tuhinangshu Choudhury, Gauri Joshi, Weina Wang et al.

In multi-server queueing systems where there is no central queue holding all incoming jobs, job dispatching policies are used to assign incoming jobs to the queue at one of the servers. Classic job dispatching policies such as join-the-shortest-queue and shortest expected delay assume that the service rates and queue lengths of the servers are known to the dispatcher. In this work, we tackle the problem of job dispatching without the knowledge of service rates and queue lengths, where the dispatcher can only obtain noisy estimates of the service rates by observing job departures. This problem presents a novel exploration-exploitation trade-off between sending jobs to all the servers to estimate their service rates, and exploiting the currently known fastest servers to minimize the expected queueing delay. We propose a bandit-based exploration policy that learns the service rates from observed job departures. Unlike the standard multi-armed bandit problem where only one out of a finite set of actions is optimal, here the optimal policy requires identifying the optimal fraction of incoming jobs to be sent to each server. We present a regret analysis and simulations to demonstrate the effectiveness of the proposed bandit-based exploration policy.

CRJun 29, 2020
On the Privacy-Utility Tradeoff in Peer-Review Data Analysis

Wenxin Ding, Nihar B. Shah, Weina Wang

A major impediment to research on improving peer review is the unavailability of peer-review data, since any release of such data must grapple with the sensitivity of the peer review data in terms of protecting identities of reviewers from authors. We posit the need to develop techniques to release peer-review data in a privacy-preserving manner. Identifying this problem, in this paper we propose a framework for privacy-preserving release of certain conference peer-review data -- distributions of ratings, miscalibration, and subjectivity -- with an emphasis on the accuracy (or utility) of the released data. The crux of the framework lies in recognizing that a part of the data pertaining to the reviews is already available in public, and we use this information to post-process the data released by any privacy mechanism in a manner that improves the accuracy (utility) of the data while retaining the privacy guarantees. Our framework works with any privacy-preserving mechanism that operates via releasing perturbed data. We present several positive and negative theoretical results, including a polynomial-time algorithm for improving on the privacy-utility tradeoff.

CRSep 6, 2019
Privacy-Utility Tradeoffs in Routing Cryptocurrency over Payment Channel Networks

Weizhao Tang, Weina Wang, Giulia Fanti et al.

Payment channel networks (PCNs) are viewed as one of the most promising scalability solutions for cryptocurrencies today. Roughly, PCNs are networks where each node represents a user and each directed, weighted edge represents funds escrowed on a blockchain; these funds can be transacted only between the endpoints of the edge. Users efficiently transmit funds from node A to B by relaying them over a path connecting A to B, as long as each edge in the path contains enough balance (escrowed funds) to support the transaction. Whenever a transaction succeeds, the edge weights are updated accordingly. However, in deployed PCNs, channel balances (i.e., edge weights) are not revealed to users for privacy reasons; users know only the initial weights at time 0. Hence, when routing transactions, users first guess a path, then check if it supports the transaction. This guess-and-check process dramatically reduces the success rate of transactions. At the other extreme, knowing full channel balances can give substantial improvements in success rate at the expense of privacy. In this work, we study whether a network can reveal noisy channel balances to trade off privacy for utility. We show fundamental limits on such a tradeoff, and propose noise mechanisms that achieve the fundamental limit for a general class of graph topologies. Our results suggest that in practice, PCNs should operate either in the low-privacy or low-utility regime; it is not possible to get large gains in utility by giving up a little privacy, or large gains in privacy by sacrificing a little utility.

SIMar 4, 2019
QuickStop: A Markov Optimal Stopping Approach for Quickest Misinformation Detection

Honghao Wei, Xiaohan Kang, Weina Wang et al.

This paper combines data-driven and model-driven methods for real-time misinformation detection. Our algorithm, named QuickStop, is an optimal stopping algorithm based on a probabilistic information spreading model obtained from labeled data. The algorithm consists of an offline machine learning algorithm for learning the probabilistic information spreading model and an online optimal stopping algorithm to detect misinformation. The online detection algorithm has both low computational and memory complexities. Our numerical evaluations with a real-world dataset show that QuickStop outperforms existing misinformation detection algorithms in terms of both accuracy and detection time (number of observations needed for detection). Our evaluations with synthetic data further show that QuickStop is robust to (offline) learning errors.

LGJan 25, 2019
Almost Boltzmann Exploration

Harsh Gupta, Seo Taek Kong, R. Srikant et al.

Boltzmann exploration is widely used in reinforcement learning to provide a trade-off between exploration and exploitation. Recently, in (Cesa-Bianchi et al., 2017) it has been shown that pure Boltzmann exploration does not perform well from a regret perspective, even in the simplest setting of stochastic multi-armed bandit (MAB) problems. In this paper, we show that a simple modification to Boltzmann exploration, motivated by a variation of the standard doubling trick, achieves $O(K\log^{1+α} T)$ regret for a stochastic MAB problem with $K$ arms, where $α>0$ is a parameter of the algorithm. This improves on the result in (Cesa-Bianchi et al., 2017), where an algorithm inspired by the Gumbel-softmax trick achieves $O(K\log^2 T)$ regret. We also show that our algorithm achieves $O(β(G) \log^{1+α} T)$ regret in stochastic MAB problems with graph-structured feedback, without knowledge of the graph structure, where $β(G)$ is the independence number of the feedback graph. Additionally, we present extensive experimental results on real datasets and applications for multi-armed bandits with both traditional bandit feedback and graph-structured feedback. In all cases, our algorithm performs as well or better than the state-of-the-art.

GTMar 22, 2016
The Value of Privacy: Strategic Data Subjects, Incentive Mechanisms and Fundamental Limits

Weina Wang, Lei Ying, Junshan Zhang

We study the value of data privacy in a game-theoretic model of trading private data, where a data collector purchases private data from strategic data subjects (individuals) through an incentive mechanism. The private data of each individual represents her knowledge about an underlying state, which is the information that the data collector desires to learn. Different from most of the existing work on privacy-aware surveys, our model does not assume the data collector to be trustworthy. Then, an individual takes full control of its own data privacy and reports only a privacy-preserving version of her data. In this paper, the value of $ε$ units of privacy is measured by the minimum payment of all nonnegative payment mechanisms, under which an individual's best response at a Nash equilibrium is to report the data with a privacy level of $ε$. The higher $ε$ is, the less private the reported data is. We derive lower and upper bounds on the value of privacy which are asymptotically tight as the number of data subjects becomes large. Specifically, the lower bound assures that it is impossible to use less amount of payment to buy $ε$ units of privacy, and the upper bound is given by an achievable payment mechanism that we designed. Based on these fundamental limits, we further derive lower and upper bounds on the minimum total payment for the data collector to achieve a given learning accuracy target, and show that the total payment of the designed mechanism is at most one individual's payment away from the minimum.

CRFeb 16, 2014
On the Relation Between Identifiability, Differential Privacy and Mutual-Information Privacy

Weina Wang, Lei Ying, Junshan Zhang

This paper investigates the relation between three different notions of privacy: identifiability, differential privacy and mutual-information privacy. Under a unified privacy-distortion framework, where the distortion is defined to be the Hamming distance of the input and output databases, we establish some fundamental connections between these three privacy notions. Given a distortion level $D$, define $ε_{\mathrm{i}}^*(D)$ to be the smallest (best) identifiability level, and $ε_{\mathrm{d}}^*(D)$ to be the smallest differential privacy level. We characterize $ε_{\mathrm{i}}^*(D)$ and $ε_{\mathrm{d}}^*(D)$, and prove that $ε_{\mathrm{i}}^*(D)-ε_X\leε_{\mathrm{d}}^*(D)\leε_{\mathrm{i}}^*(D)$ for $D$ in some range, where $ε_X$ is a constant depending on the distribution of the original database $X$, and diminishes to zero when the distribution of $X$ is uniform. Furthermore, we show that identifiability and mutual-information privacy are consistent in the sense that given distortion level $D$, the mechanism that optimizes the mutual-information privacy also minimizes the identifiability level.

CRFeb 14, 2014
A Minimax Distortion View of Differentially Private Query Release

Weina Wang, Lei Ying, Junshan Zhang

We consider the problem of differentially private query release through a synthetic database approach. Departing from the existing approaches that require the query set to be specified in advance, we advocate to devise query-set independent mechanisms, with an ambitious goal of providing accurate answers, while meeting the privacy constraints, for all queries in a general query class. Specifically, a differentially private mechanism is constructed to "encode" rich stochastic structure into the synthetic database, and "customized" companion estimators are then derived to provide accurate answers by making use of all available information, including the mechanism (which is public information) and the query functions. Accordingly, the distortion under the best of this kind of mechanisms at the worst-case query in a general query class, so called the minimax distortion, provides a fundamental characterization of differentially private query release. For the general class of statistical queries, we prove that with the squared-error distortion measure, the minimax distortion is $O(1/n)$ by deriving asymptotically tight upper and lower bounds in the regime that the database size $n$ goes to infinity. The upper bound is achievable by a mechanism $\mathcal{E}$ and its corresponding companion estimators, which points directly to the feasibility of the proposed approach in large databases. We further evaluate the mechanism $\mathcal{E}$ and the companion estimators through experiments on real datasets from Netflix and Facebook. Experimental results show improvement over the state-of-art MWEM algorithm and verify the scaling behavior $O(1/n)$ of the minimax distortion.