ITDec 7, 2018
Scheduling a Human ChannelMelih Bastopcu, Sennur Ulukus
We consider a system where a human operator processes a sequence of tasks that are similar in nature under a total time constraint. In these systems, the performance of the operator depends on its past utilization. This is akin to $\textit{state-dependent}$ channels where the past actions of the transmitter affects the future quality of the channel (also known as $\textit{action-dependent}$ or $\textit{use-dependent}$ channels). For $\textit{human channels}$, a well-known psychological phenomena, known as $\textit{Yerkes-Dodson law}$, states that a human operator performs worse when he/she is over-utilized or under-utilized. Over such a $\textit{use-dependent}$ human channel, we consider the problem of maximizing a utility function, which is monotonically increasing and concave in the time allocated for each task, under explicit minimum and maximum $\textit{utilization}$ constraints. We show that the optimal solution is to keep the utilization ratio of the operator as high as possible, and to process all the tasks. We prove that the optimal policy consists of two major strategies: utilize the operator without resting until reaching the maximum allowable utilization ratio, and then alternate between working and resting the operator each time reaching the maximum allowable utilization at the end of work-period. We show that even though the tasks are similar in difficulty, the time allocated for the tasks can be different depending on the strategy in which a task is processed; however, the tasks processed in the same strategy are processed equally.
LGJan 15
Queueing-Aware Optimization of Reasoning Tokens for Accuracy-Latency Trade-offs in LLM ServersEmre Ozbas, Melih Bastopcu
We consider a single large language model (LLM) server that serves a heterogeneous stream of queries belonging to $N$ distinct task types. Queries arrive according to a Poisson process, and each type occurs with a known prior probability. For each task type, the server allocates a fixed number of internal thinking tokens, which determines the computational effort devoted to that query. The token allocation induces an accuracy-latency trade-off: the service time follows an approximately affine function of the allocated tokens, while the probability of a correct response exhibits diminishing returns. Under a first-in, first-out (FIFO) service discipline, the system operates as an $M/G/1$ queue, and the mean system time depends on the first and second moments of the resulting service-time distribution. We formulate a constrained optimization problem that maximizes a weighted average accuracy objective penalized by the mean system time, subject to architectural token-budget constraints and queue-stability conditions. The objective function is shown to be strictly concave over the stability region, which ensures existence and uniqueness of the optimal token allocation. The first-order optimality conditions yield a coupled projected fixed-point characterization of the optimum, together with an iterative solution and an explicit sufficient condition for contraction. Moreover, a projected gradient method with a computable global step-size bound is developed to guarantee convergence beyond the contractive regime. Finally, integer-valued token allocations are attained via rounding of the continuous solution, and the resulting performance loss is evaluated in simulation results.
GTMar 25, 2024
Policy Optimization finds Nash Equilibrium in Regularized General-Sum LQ GamesMuhammad Aneeq uz Zaman, Shubham Aggarwal, Melih Bastopcu et al.
In this paper, we investigate the impact of introducing relative entropy regularization on the Nash Equilibria (NE) of General-Sum $N$-agent games, revealing the fact that the NE of such games conform to linear Gaussian policies. Moreover, it delineates sufficient conditions, contingent upon the adequacy of entropy regularization, for the uniqueness of the NE within the game. As Policy Optimization serves as a foundational approach for Reinforcement Learning (RL) techniques aimed at finding the NE, in this work we prove the linear convergence of a policy optimization algorithm which (subject to the adequacy of entropy regularization) is capable of provably attaining the NE. Furthermore, in scenarios where the entropy regularization proves insufficient, we present a $δ$-augmentation technique, which facilitates the achievement of an $ε$-NE within the game.
GTMar 13, 2024
Learning How to Strategically Disclose InformationRaj Kiriti Velicheti, Melih Bastopcu, S. Rasoul Etesami et al.
Strategic information disclosure, in its simplest form, considers a game between an information provider (sender) who has access to some private information that an information receiver is interested in. While the receiver takes an action that affects the utilities of both players, the sender can design information (or modify beliefs) of the receiver through signal commitment, hence posing a Stackelberg game. However, obtaining a Stackelberg equilibrium for this game traditionally requires the sender to have access to the receiver's objective. In this work, we consider an online version of information design where a sender interacts with a receiver of an unknown type who is adversarially chosen at each round. Restricting attention to Gaussian prior and quadratic costs for the sender and the receiver, we show that $\mathcal{O}(\sqrt{T})$ regret is achievable with full information feedback, where $T$ is the total number of interactions between the sender and the receiver. Further, we propose a novel parametrization that allows the sender to achieve $\mathcal{O}(\sqrt{T})$ regret for a general convex utility function. We then consider the Bayesian Persuasion problem with an additional cost term in the objective function, which penalizes signaling policies that are more informative and obtain $\mathcal{O}(\log(T))$ regret. Finally, we establish a sublinear regret bound for the partial information feedback setting and provide simulations to support our theoretical results.
GTAug 29, 2025
A Soft Inducement Framework for Incentive-Aided Steering of No-Regret PlayersAsrin Efe Yorulmaz, Raj Kiriti Velicheti, Melih Bastopcu et al.
In this work, we investigate a steering problem in a mediator-augmented two-player normal-form game, where the mediator aims to guide players toward a specific action profile through information and incentive design. We first characterize the games for which successful steering is possible. Moreover, we establish that steering players to any desired action profile is not always achievable with information design alone, nor when accompanied with sublinear payment schemes. Consequently, we derive a lower bound on the constant payments required per round to achieve this goal. To address these limitations incurred with information design, we introduce an augmented approach that involves a one-shot information design phase before the start of the repeated game, transforming the prior interaction into a Stackelberg game. Finally, we theoretically demonstrate that this approach improves the convergence rate of players' action profiles to the target point by a constant factor with high probability, and support it with empirical results.
LGAug 4, 2025
Balancing Information Accuracy and Response Timeliness in Networked LLMsYigit Turkmen, Baturalp Buyukates, Melih Bastopcu
Recent advancements in Large Language Models (LLMs) have transformed many fields including scientific discovery, content generation, biomedical text mining, and educational technology. However, the substantial requirements for training data, computational resources, and energy consumption pose significant challenges for their practical deployment. A promising alternative is to leverage smaller, specialized language models and aggregate their outputs to improve overall response quality. In this work, we investigate a networked LLM system composed of multiple users, a central task processor, and clusters of topic-specialized LLMs. Each user submits categorical binary (true/false) queries, which are routed by the task processor to a selected cluster of $m$ LLMs. After gathering individual responses, the processor returns a final aggregated answer to the user. We characterize both the information accuracy and response timeliness in this setting, and formulate a joint optimization problem to balance these two competing objectives. Our extensive simulations demonstrate that the aggregated responses consistently achieve higher accuracy than those of individual LLMs. Notably, this improvement is more significant when the participating LLMs exhibit similar standalone performance.