SYJul 28, 2018
Communication-efficient Distributed Multi-resource AllocationSyed Eqbal Alam, Robert Shorten, Fabian Wirth et al.
In several smart city applications, multiple resources must be allocated among competing agents that are coupled through such shared resources and are constrained --- either through limitations of communication infrastructure or privacy considerations. We propose a distributed algorithm to solve such distributed multi-resource allocation problems with no direct inter-agent communication. We do so by extending a recently introduced additive-increase multiplicative-decrease (AIMD) algorithm, which only uses very little communication between the system and agents. Namely, a control unit broadcasts a one-bit signal to agents whenever one of the allocated resources exceeds capacity. Agents then respond to this signal in a probabilistic manner. In the proposed algorithm, each agent makes decision of its resource demand locally and an agent is unaware of the resource allocation of other agents. In empirical results, we observe that the average allocations converge over time to optimal allocations.
SYMar 25, 2019
Distributed Algorithms for Internet-of-Things-enabled Prosumer Markets: A Control Theoretic PerspectiveSyed Eqbal Alam, Robert Shorten, Fabian Wirth et al.
Internet-of-Things (IoT) enables the development of sharing economy applications. In many sharing economy scenarios, agents both produce as well as consume a resource; we call them prosumers. A community of prosumers agrees to sell excess resource to another community in a prosumer market. In this chapter, we propose a control theoretic approach to regulate the number of prosumers in a prosumer community, where each prosumer has a cost function that is coupled through its time-averaged production and consumption of the resource. Furthermore, each prosumer runs its distributed algorithm and takes only binary decisions in a probabilistic way, whether to produce one unit of the resource or not and to consume one unit of the resource or not. In the proposed approach, prosumers do not explicitly exchange information with each other due to privacy reasons, but little exchange of information is required for feedback signals, broadcast by a central agency. In the proposed approach, prosumers achieve the optimal values asymptotically. Furthermore, the proposed approach is suitable to implement in an IoT context with minimal demands on infrastructure. We describe two use cases; community-based car sharing and collaborative energy storage for prosumer markets. We also present simulation results to check the efficacy of the algorithms.
SYDec 21, 2018
Derandomized Distributed Multi-resource Allocation with Little Communication OverheadSyed Eqbal Alam, Robert Shorten, Fabian Wirth et al.
We study a class of distributed optimization problems for multiple shared resource allocation in Internet-connected devices. We propose a derandomized version of an existing stochastic additive-increase and multiplicative-decrease (AIMD) algorithm. The proposed solution uses one bit feedback signal for each resource between the system and the Internet-connected devices and does not require inter-device communication. Additionally, the Internet-connected devices do not compromise their privacy and the solution does not dependent on the number of participating devices. In the system, each Internet-connected device has private cost functions which are strictly convex, twice continuously differentiable and increasing. We show empirically that the long-term average allocations of multiple shared resources converge to optimal allocations and the system achieves minimum social cost. Furthermore, we show that the proposed derandomized AIMD algorithm converges faster than the stochastic AIMD algorithm and both the approaches provide approximately same solutions.
OCDec 2, 2015
Central-limit approach to risk-aware Markov decision processesPengqian Yu, Jia Yuan Yu, Huan Xu
Whereas classical Markov decision processes maximize the expected reward, we consider minimizing the risk. We propose to evaluate the risk associated to a given policy over a long-enough time horizon with the help of a central limit theorem. The proposed approach works whether the transition probabilities are known or not. We also provide a gradient-based policy improvement algorithm that converges to a local optimum of the risk objective.
GTApr 16, 2018
Distributed, Private, and Derandomized Allocation Algorithm for EV ChargingHamid Nabati, Jia Yuan Yu
Efficient resource allocation is challenging when privacy of users is important. Distributed approaches have recently been used extensively to find a solution for such problems. In this work, the efficiency of distributed AIMD algorithm for allocation of subsidized goods is studied. First, a suitable utility function is assigned to each user describing the amount of satisfaction that it has from allocated resource. Then the resource allocation is defined as a total utilitarianism problem that is an optimization problem of sum of users utility functions subjected to capacity constraint. Recently, a stochastic state-dependent variant of AIMD algorithm is used for allocation of common goods among users with strictly increasing and concave utility functions. Here, the stochastic AIMD algorithm is derandomized and its efficiency is compared with the stochastic version. Moreover, the algorithm is improved to allocate subsidized goods to users with concave and non-monotone utility functions as well as users with Sigmoidal utility functions. To illustrate the effectiveness of the proposed solutions, simulation results is presented for a public renewable-energy powered charging station in which the electric vehicles (EV) compete to be recharged.
AIApr 21, 2024
Error Analysis of Shapley Value-Based Model Explanations: An Informative PerspectiveNingsheng Zhao, Jia Yuan Yu, Krzysztof Dzieciolowski et al.
Shapley value attribution (SVA) is an increasingly popular explainable AI (XAI) method, which quantifies the contribution of each feature to the model's output. However, recent work has shown that most existing methods to implement SVAs have some drawbacks, resulting in biased or unreliable explanations that fail to correctly capture the true intrinsic relationships between features and model outputs. Moreover, the mechanism and consequences of these drawbacks have not been discussed systematically. In this paper, we propose a novel error theoretical analysis framework, in which the explanation errors of SVAs are decomposed into two components: observation bias and structural bias. We further clarify the underlying causes of these two biases and demonstrate that there is a trade-off between them. Based on this error analysis framework, we develop two novel concepts: over-informative and underinformative explanations. We demonstrate how these concepts can be effectively used to understand potential errors of existing SVA methods. In particular, for the widely deployed assumption-based SVAs, we find that they can easily be under-informative due to the distribution drift caused by distributional assumptions. We propose a measurement tool to quantify such a distribution drift. Finally, our experiments illustrate how different existing SVA methods can be over- or under-informative. Our work sheds light on how errors incur in the estimation of SVAs and encourages new less error-prone methods.
MLMay 11, 2025
Outperformance Score: A Universal Standardization Method for Confusion-Matrix-Based Classification Performance MetricsNingsheng Zhao, Trang Bui, Jia Yuan Yu et al.
Many classification performance metrics exist, each suited to a specific application. However, these metrics often differ in scale and can exhibit varying sensitivity to class imbalance rates in the test set. As a result, it is difficult to use the nominal values of these metrics to interpret and evaluate classification performances, especially when imbalance rates vary. To address this problem, we introduce the outperformance score function, a universal standardization method for confusion-matrix-based classification performance (CMBCP) metrics. It maps any given metric to a common scale of $[0,1]$, while providing a clear and consistent interpretation. Specifically, the outperformance score represents the percentile rank of the observed classification performance within a reference distribution of possible performances. This unified framework enables meaningful comparison and monitoring of classification performance across test sets with differing imbalance rates. We illustrate how the outperformance scores can be applied to a variety of commonly used classification performance metrics and demonstrate the robustness of our method through experiments on real-world datasets spanning multiple classification applications.
MLFeb 5
Causal Inference on Stopped Random Walks in Online AdvertisingJia Yuan Yu
We consider a causal inference problem frequently encountered in online advertising systems, where a publisher (e.g., Instagram, TikTok) interacts repeatedly with human users and advertisers by sporadically displaying to each user an advertisement selected through an auction. Each treatment corresponds to a parameter value of the advertising mechanism (e.g., auction reserve-price), and we want to estimate through experiments the corresponding long-term treatment effect (e.g., annual advertising revenue). In our setting, the treatment affects not only the instantaneous revenue from showing an ad, but also changes each user's interaction-trajectory, and each advertiser's bidding policy -- as the latter is constrained by a finite budget. In particular, each a treatment may even affect the size of the population, since users interact longer with a tolerable advertising mechanism. We drop the classical i.i.d. assumption and model the experiment measurements (e.g., advertising revenue) as a stopped random walk, and use a budget-splitting experimental design, the Anscombe Theorem, a Wald-like equation, and a Central Limit Theorem to construct confidence intervals for the long-term treatment effect.
CLFeb 19, 2022
Reward Modeling for Mitigating Toxicity in Transformer-based Language ModelsFarshid Faal, Ketra Schmitt, Jia Yuan Yu
Transformer-based language models are able to generate fluent text and be efficiently adapted across various natural language generation tasks. However, language models that are pretrained on large unlabeled web text corpora have been shown to suffer from degenerating toxic content and social bias behaviors, consequently hindering their safe deployment. Various detoxification methods were proposed to mitigate the language model's toxicity; however, these methods struggled to detoxify language models when conditioned on prompts that contain specific social identities related to gender, race, or religion. In this study, we propose Reinforce-Detoxify; A reinforcement learning-based method for mitigating toxicity in language models. We address the challenge of safety in language models and propose a new reward model that is able to detect toxic content and mitigate unintended bias towards social identities in toxicity prediction. The experiments demonstrate that the Reinforce-Detoxify method for language model detoxification outperforms existing detoxification approaches in automatic evaluation metrics, indicating the ability of our approach in language model detoxification and less prone to unintended bias toward social identities in generated content.
OCApr 26, 2021
Multi-resource allocation for federated settings: A non-homogeneous Markov chain modelSyed Eqbal Alam, Fabian Wirth, Jia Yuan Yu
In a federated setting, agents coordinate with a central agent or a server to solve an optimization problem in which agents do not share their information with each other. Wirth and his co-authors, in a recent paper, describe how the basic additive-increase multiplicative-decrease (AIMD) algorithm can be modified in a straightforward manner to solve a class of optimization problems for federated settings for a single shared resource with no inter-agent communication. The AIMD algorithm is one of the most successful distributed resource allocation algorithms currently deployed in practice. It is best known as the backbone of the Internet and is also widely explored in other application areas. We extend the single-resource algorithm to multiple heterogeneous shared resources that emerge in smart cities, sharing economy, and many other applications. Our main results show the convergence of the average allocations to the optimal values. We model the system as a non-homogeneous Markov chain with place-dependent probabilities. Furthermore, simulation results are presented to demonstrate the efficacy of the algorithms and to highlight the main features of our analysis.
MEMar 8, 2021
Bias-Corrected Peaks-Over-Threshold Estimation of the CVaRDylan Troop, Frédéric Godin, Jia Yuan Yu
The conditional value-at-risk (CVaR) is a useful risk measure in fields such as machine learning, finance, insurance, energy, etc. When measuring very extreme risk, the commonly used CVaR estimation method of sample averaging does not work well due to limited data above the value-at-risk (VaR), the quantile corresponding to the CVaR level. To mitigate this problem, the CVaR can be estimated by extrapolating above a lower threshold than the VaR using a generalized Pareto distribution (GPD), which is often referred to as the peaks-over-threshold (POT) approach. This method often requires a very high threshold to fit well, leading to high variance in estimation, and can induce significant bias if the threshold is chosen too low. In this paper, we derive a new expression for the GPD approximation error of the CVaR, a bias term induced by the choice of threshold, as well as a bias correction method for the estimated GPD parameters. This leads to the derivation of a new estimator for the CVaR that we prove to be asymptotically unbiased. In a practical setting, we show through experiments that our estimator provides a significant performance improvement compared with competing CVaR estimators in finite samples. As a consequence of our bias correction method, it is also shown that a much lower threshold can be selected without introducing significant bias. This allows a larger portion of data to be be used in CVaR estimation compared with the typical POT approach, leading to more stable estimates. As secondary results, a new estimator for a second-order parameter of heavy-tailed distributions is derived, as well as a confidence interval for the CVaR which enables quantifying the level of variability in our estimator.
IVMar 5, 2020
Generating Embroidery Patterns Using Image-to-Image TranslationMohammad Akif Beg, Jia Yuan Yu
In many scenarios in computer vision, machine learning, and computer graphics, there is a requirement to learn the mapping from an image of one domain to an image of another domain, called Image-to-image translation. For example, style transfer, object transfiguration, visually altering the appearance of weather conditions in an image, changing the appearance of a day image into a night image or vice versa, photo enhancement, to name a few. In this paper, we propose two machine learning techniques to solve the embroidery image-to-image translation. Our goal is to generate a preview image which looks similar to an embroidered image, from a user-uploaded image. Our techniques are modifications of two existing techniques, neural style transfer, and cycle-consistent generative-adversarial network. Neural style transfer renders the semantic content of an image from one domain in the style of a different image in another domain, whereas a cycle-consistent generative adversarial network learns the mapping from an input image to output image without any paired training data, and also learn a loss function to train this mapping. Furthermore, the techniques we propose are independent of any embroidery attributes, such as elevation of the image, light-source, start, and endpoints of a stitch, type of stitch used, fabric type, etc. Given the user image, our techniques can generate a preview image which looks similar to an embroidered image. We train and test our propose techniques on an embroidery dataset which consist of simple 2D images. To do so, we prepare an unpaired embroidery dataset with more than 8000 user-uploaded images along with embroidered images. Empirical results show that these techniques successfully generate an approximate preview of an embroidered version of a user image, which can help users in decision making.
MLDec 3, 2019
Risk-Averse Action Selection Using Extreme Value Theory Estimates of the CVaRDylan Troop, Frédéric Godin, Jia Yuan Yu
In a wide variety of sequential decision making problems, it can be important to estimate the impact of rare events in order to minimize risk exposure. A popular risk measure is the conditional value-at-risk (CVaR), which is commonly estimated by averaging observations that occur beyond a quantile at a given confidence level. When this confidence level is very high, this estimation method can exhibit high variance due to the limited number of samples above the corresponding quantile. To mitigate this problem, extreme value theory can be used to derive an estimator for the CVaR that uses extrapolation beyond available samples. This estimator requires the selection of a threshold parameter to work well, which is a difficult challenge that has been widely studied in the extreme value theory literature. In this paper, we present an estimation procedure for the CVaR that combines extreme value theory and a recently introduced method of automated threshold selection by \cite{bader2018automated}. Under appropriate conditions, we estimate the tail risk using a generalized Pareto distribution. We compare empirically this estimation procedure with the commonly used method of sample averaging, and show an improvement in performance for some distributions. We finally show how the estimation procedure can be used in reinforcement learning by applying our method to the multi-arm bandit problem where the goal is to avoid catastrophic risk.
AIJul 9, 2019
A Scheme for Dynamic Risk-Sensitive Sequential Decision MakingShuai Ma, Jia Yuan Yu, Ahmet Satir
We present a scheme for sequential decision making with a risk-sensitive objective and constraints in a dynamic environment. A neural network is trained as an approximator of the mapping from parameter space to space of risk and policy with risk-sensitive constraints. For a given risk-sensitive problem, in which the objective and constraints are, or can be estimated by, functions of the mean and variance of return, we generate a synthetic dataset as training data. Parameters defining a targeted process might be dynamic, i.e., they might vary over time, so we sample them within specified intervals to deal with these dynamics. We show that: i). Most risk measures can be estimated using return variance; ii). By virtue of the state-augmentation transformation, practical problems modeled by Markov decision processes with stochastic rewards can be solved in a risk-sensitive scenario; and iii). The proposed scheme is validated by a numerical experiment.
LGJul 9, 2019
Variance-Based Risk Estimations in Markov Processes via Transformation with State LumpingShuai Ma, Jia Yuan Yu
Variance plays a crucial role in risk-sensitive reinforcement learning, and most risk measures can be analyzed via variance. In this paper, we consider two law-invariant risks as examples: mean-variance risk and exponential utility risk. With the aid of the state-augmentation transformation (SAT), we show that, the two risks can be estimated in Markov decision processes (MDPs) with a stochastic transition-based reward and a randomized policy. To relieve the enlarged state space, a novel definition of isotopic states is proposed for state lumping, considering the special structure of the transformed transition probability. In the numerical experiment, we illustrate state lumping in the SAT, errors from a naive reward simplification, and the validity of the SAT for the two risk estimations.
SYApr 29, 2019
On the Control of Agents Coupled through Shared Unit-demand ResourcesSyed Eqbal Alam, Robert Shorten, Fabian Wirth et al.
We consider a control problem involving several agents coupled through multiple unit-demand resources. Such resources are indivisible, and each agent's consumption is modeled as a Bernoulli random variable. Controlling the number of such agents in a probabilistic manner, subject to capacity constraints, is ubiquitous in smart cities. For instance, such agents can be humans in a feedback loop---who respond to a price signal, or automated decision-support systems that strive toward system-level goals. In this paper, we consider both single feedback loop corresponding to a single resource and multiple coupled feedback loops corresponding to multiple resources consumed by the same population of agents. For example, when a network of devices allocates resources to deliver several services, these services are coupled through capacity constraints on the resources. We propose a new algorithm with fundamental guarantees of convergence and optimality, as well as present an example illustrating its performance.
CVAug 17, 2018
Efficient Single-Shot Multibox Detector for Construction Site MonitoringViral Thakar, Himani Saini, Walid Ahmed et al.
Asset monitoring in construction sites is an intricate, manually intensive task, that can highly benefit from automated solutions engineered using deep neural networks. We use Single-Shot Multibox Detector --- SSD, for its fine balance between speed and accuracy, to leverage ubiquitously available images and videos from the surveillance cameras on the construction sites and automate the monitoring tasks, hence enabling project managers to better track the performance and optimize the utilization of each resource. We propose to improve the performance of SSD by clustering the predicted boxes instead of a greedy approach like non-maximum suppression. We do so using Affinity Propagation Clustering --- APC to cluster the predicted boxes based on the similarity index computed using the spatial features as well as location of predicted boxes. In our attempts, we have been able to improve the mean average precision of SSD by 3.77% on custom dataset consist of images from construction sites and by 1.67% on PASCAL VOC Challenge.
CVAug 17, 2018
Ensemble-based Adaptive Single-shot Multi-box DetectorViral Thakar, Walid Ahmed, Mohammad M Soltani et al.
We propose two improvements to the SSD---single shot multibox detector. First, we propose an adaptive approach for default box selection in SSD. This uses data to reduce the uncertainty in the selection of best aspect ratios for the default boxes and improves performance of SSD for datasets containing small and complex objects (e.g., equipments at construction sites). We do so by finding the distribution of aspect ratios of the given training dataset, and then choosing representative values. Secondly, we propose an ensemble algorithm, using SSD as components, which improves the performance of SSD, especially for small amount of training datasets. Compared to the conventional SSD algorithm, adaptive box selection improves mean average precision by 3%, while ensemble-based SSD improves it by 8%.
AIApr 16, 2018
State-Augmentation Transformations for Risk-Sensitive Reinforcement LearningShuai Ma, Jia Yuan Yu
In the framework of MDP, although the general reward function takes three arguments-current state, action, and successor state; it is often simplified to a function of two arguments-current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected cumulative reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect value. We present state-augmentation transformations (SATs), which preserve the reward sequences as well as the reward distributions and the optimal policy in risk-sensitive reinforcement learning. In risk-sensitive scenarios, firstly we prove that, for every MDP with a stochastic transition-based reward function, there exists an MDP with a deterministic state-based reward function, such that for any given (randomized) policy for the first MDP, there exists a corresponding policy for the second MDP, such that both Markov reward processes share the same reward sequence. Secondly we illustrate that two situations require the proposed SATs in an inventory control problem. One could be using Q-learning (or other learning methods) on MDPs with transition-based reward functions, and the other could be using methods, which are for the Markov processes with a deterministic state-based reward functions, on the Markov processes with general reward functions. We show the advantage of the SATs by considering Value-at-Risk as an example, which is a risk measure on the reward distribution instead of the measures (such as mean and variance) of the distribution. We illustrate the error in the reward distribution estimation from the direct use of Q-learning, and show how the SATs enable a variance formula to work on Markov processes with general reward functions.
SIDec 19, 2017
The Merits of Sharing a RidePooyan Ehsani, Jia Yuan Yu
The culture of sharing instead of ownership is sharply increasing in individuals behaviors. Particularly in transportation, concepts of sharing a ride in either carpooling or ridesharing have been recently adopted. An efficient optimization approach to match passengers in real-time is the core of any ridesharing system. In this paper, we model ridesharing as an online matching problem on general graphs such that passengers do not drive private cars and use shared taxis. We propose an optimization algorithm to solve it. The outlined algorithm calculates the optimal waiting time when a passenger arrives. This leads to a matching with minimal overall overheads while maximizing the number of partnerships. To evaluate the behavior of our algorithm, we used NYC taxi real-life data set. Results represent a substantial reduction in overall overheads.
AIDec 7, 2016
Transition-based versus State-based Reward Functions for MDPs with Value-at-RiskShuai Ma, Jia Yuan Yu
In reinforcement learning, the reward function on current state and action is widely used. When the objective is about the expectation of the (discounted) total reward only, it works perfectly. However, if the objective involves the total reward distribution, the result will be wrong. This paper studies Value-at-Risk (VaR) problems in short- and long-horizon Markov decision processes (MDPs) with two reward functions, which share the same expectations. Firstly we show that with VaR objective, when the real reward function is transition-based (with respect to action and both current and next states), the simplified (state-based, with respect to action and current state only) reward function will change the VaR. Secondly, for long-horizon MDPs, we estimate the VaR function with the aid of spectral theory and the central limit theorem. Thirdly, since the estimation method is for a Markov reward process with the reward function on current state only, we present a transformation algorithm for the Markov reward process with the reward function on current and next states, in order to estimate the VaR function with an intact total reward distribution.
OCJan 25, 2016
Pricing Vehicle Sharing with Proximity InformationJakub Marecek, Robert Shorten, Jia Yuan Yu
For vehicle sharing schemes, where drop-off positions are not fixed, we propose a pricing scheme, where the price depends in part on the distance between where a vehicle is being dropped off and where the closest shared vehicle is parked. Under certain restrictive assumptions, we show that this pricing leads to a socially optimal spread of the vehicles within a region.
AISep 29, 2015
Two Phase $Q-$learning for Bidding-based Vehicle SharingYinlam Chow, Jia Yuan Yu, Marco Pavone
We consider one-way vehicle sharing systems where customers can rent a car at one station and drop it off at another. The problem we address is to optimize the distribution of cars, and quality of service, by pricing rentals appropriately. We propose a bidding approach that is inspired from auctions and takes into account the significant uncertainty inherent in the problem data (e.g., pick-up and drop-off locations, time of requests, and duration of trips). Specifically, in contrast to current vehicle sharing systems, the operator does not set prices. Instead, customers submit bids and the operator decides whether to rent or not. The operator can even accept negative bids to motivate drivers to rebalance available cars to unpopular destinations within a city. We model the operator's sequential decision-making problem as a \emph{constrained Markov decision problem} (CMDP) and propose and rigorously analyze a novel two phase $Q$-learning algorithm for its solution. Numerical experiments are presented and discussed.
MLMay 10, 2014
Functional BanditsLong Tran-Thanh, Jia Yuan Yu
We introduce the functional bandit problem, where the objective is to find an arm that optimises a known functional of the unknown arm-reward distributions. These problems arise in many settings such as maximum entropy methods in natural language processing, and risk-averse decision-making, but current best-arm identification techniques fail in these domains. We propose a new approach, that combines functional estimation and arm elimination, to tackle this problem. This method achieves provably efficient performance guarantees. In addition, we illustrate this method on a number of important functionals in risk management and information theory, and refine our generic theoretical results in those cases.
OCApr 9, 2014
r-Extreme Signalling for Congestion ControlJakub Marecek, Robert Shorten, Jia Yuan Yu
In many "smart city" applications, congestion arises in part due to the nature of signals received by individuals from a central authority. In the model of Marecek et al. [arXiv:1406.7639, Int. J. Control 88(10), 2015], each agent uses one out of multiple resources at each time instant. The per-use cost of a resource depends on the number of concurrent users. A central authority has up-to-date knowledge of the congestion across all resources and uses randomisation to provide a scalar or an interval for each resource at each time. In this paper, the interval to broadcast per resource is obtained by taking the minima and maxima of costs observed within a time window of length r, rather than by randomisation. We show that the resulting distribution of agents across resources also converges in distribution, under plausible assumptions about the evolution of the population over time.