LGFeb 14, 2023
A Bandit Approach to Online Pricing for Heterogeneous Edge Resource AllocationJiaming Cheng, Duong Thuy Anh Nguyen, Lele Wang et al.
Edge Computing (EC) offers a superior user experience by positioning cloud resources in close proximity to end users. The challenge of allocating edge resources efficiently while maximizing profit for the EC platform remains a sophisticated problem, especially with the added complexity of the online arrival of resource requests. To address this challenge, we propose to cast the problem as a multi-armed bandit problem and develop two novel online pricing mechanisms, the Kullback-Leibler Upper Confidence Bound (KL-UCB) algorithm and the Min-Max Optimal algorithm, for heterogeneous edge resource allocation. These mechanisms operate in real-time and do not require prior knowledge of demand distribution, which can be difficult to obtain in practice. The proposed posted pricing schemes allow users to select and pay for their preferred resources, with the platform dynamically adjusting resource prices based on observed historical data. Numerical results show the advantages of the proposed mechanisms compared to several benchmark schemes derived from traditional bandit algorithms, including the Epsilon-Greedy, basic UCB, and Thompson Sampling algorithms.
33.7SYApr 8
Projected Variational Quantum Extragradient for Zero-Sum GamesDuong The Do, Matthew Aldridge, Duong Tung Nguyen
We propose a projected variational quantum extragradient (VQEG) framework for computing approximate Nash equilibria in two-player zero-sum matrix games. Mixed strategies are parameterized as Born distributions of parameterized quantum circuits (PQCs), transforming the classical bilinear saddle point problem into a smooth but generally minmax optimization in circuit-parameter space. The expected payoff is expressed as the expectation of a diagonal observable, enabling gradient evaluation via the parameter shift rule and compatibility with shot based quantum hardware. To support arbitrary game sizes, we introduce a dominated embedding that maps (m,n) games to qubit-compatible power-of-two dimensions while preserving equilibrium structure. We then develop a projected extragradient method using stochastic gradient estimates derived from finite measurement shots, and establish variance bounds scaling as O(1/S) with respect to the number of measurement shots S, along with convergence to approximate first-order stationarity under standard assumptions. Since stationarity does not guarantee equilibrium optimality, we evaluate performance using the game-space Nash gap. Numerical results demonstrate high-precision solutions on structured instances up to 32x32, while highlighting challenges in unstructured settings.
51.7NIMar 30
Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed InferenceJiaming Cheng, Duong Tung Nguyen
This letter investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers (DCs) over time. Each DC features on-site renewable generation and faces dynamic electricity prices and spatiotemporal variability in renewable availability. The central question is: how can inference workloads be optimally distributed to the DCs to minimize energy consumption, carbon emissions, and water usage while enhancing user experience? This letter proposes a novel optimization model for LLM service providers to reduce operational costs and environmental impacts. Numerical results validate the efficacy of the proposed approach.
SYJan 25, 2016
Optimal Energy Management for SmartGrids Considering Thermal Load and Dynamic PricingDuong Tung Nguyen
More active participation of the demand side and efficient integration of distributed energy resources (DERs) such as electric vehicles (trVs), energy storage (ES), and renewable energy sources (RESs) into the existing power systems are important design objectives of the future smart grid. In general, effective demand side management (DSM) would benefit both system operators (e.g., peak demand reduction) and electricity customers (e.g., cost saving). For building and home energy scheduling design, heating, ventilation, and air-conditioning (HVAC) systems play a very important role since HVAC power consumption is very significant and the HVAC load can be scheduled flexibly while still maintaining user comfort requirements. This thesis focuses on energy scheduling design for two different application scenarios where HVAC and various DERs are considered to optimize the benefits electric users.
OCSep 25, 2024
Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed NetworksDuong Thuy Anh Nguyen, Su Wang, Duong Tung Nguyen et al.
We investigate the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs, and, in doing so, propose a consensus-based algorithm called DSGTm-TV. The proposed algorithm incorporates gradient tracking and heavy-ball momentum to distributively optimize a global objective function, while preserving local data privacy. Under DSGTm-TV, agents will update local model parameters and gradient estimates using information exchange with neighboring agents enabled through row- and column-stochastic mixing matrices, which we show guarantee both consensus and optimality. Our analysis establishes that DSGTm-TV exhibits linear convergence to the exact global optimum when exact gradient information is available, and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients. Moreover, in contrast to existing methods, DSGTm-TV preserves convergence for networks with uncoordinated stepsizes and momentum parameters, for which we provide explicit bounds. These results enable agents to operate in a fully decentralized manner, independently optimizing their local hyper-parameters. We demonstrate the efficacy of our approach via comparisons with state-of-the-art baselines on real-world image classification and natural language processing tasks.
14.4LGApr 8
Fast Heterogeneous Serving: Scalable Mixed-Scale LLM Allocation for SLO-Constrained InferenceJiaming Cheng, Duong Tung Nguyen
Deploying large language model (LLM) inference at scale requires jointly selecting base models, provisioning heterogeneous GPUs, configuring parallelism, and distributing workloads under tight latency, accuracy, and budget constraints. Exact mixed-integer linear programming (MILP) approaches guarantee optimality but scale poorly. We propose two constraint-aware heuristics: a Greedy Heuristic (GH) for single-pass allocation, and an Adaptive Greedy Heuristic (AGH) that enhances GH via multi-start construction, relocate-based local search, and GPU consolidation. Three constraint-aware mechanisms -- TP-aware feasibility selection, cost-per-effective-coverage ranking, and TP upgrade -- ensure feasibility under tightly coupled memory, delay, error, and budget constraints. On workloads calibrated with the Azure LLM Inference Trace (2025), both heuristics produce feasible solutions in under one second, with AGH closely approaching optimal cost while achieving over 260x speedup on large-scale instances. Under out-of-sample stress tests with up to 1.5x parameter inflation, AGH maintains controlled SLO violations and stable cost, whereas the exact solver's placement degrades sharply.