GTFeb 20, 2023
Logit-Q Dynamics for Efficient Learning in Stochastic TeamsAhmed Said Donmez, Onur Unlu, Muhammed O. Sayin
We present a new family of logit-Q dynamics for efficient learning in stochastic games by combining the log-linear learning (also known as logit dynamics) for the repeated play of normal-form games with Q-learning for unknown Markov decision processes within the auxiliary stage-game framework. In this framework, we view stochastic games as agents repeatedly playing some stage game associated with the current state of the underlying game while the agents' Q-functions determine the payoffs of these stage games. We show that the logit-Q dynamics presented reach (near) efficient equilibrium in stochastic teams with unknown dynamics and quantify the approximation error. We also show the rationality of the logit-Q dynamics against agents following pure stationary strategies and the convergence of the dynamics in stochastic games where the stage-payoffs induce potential games, yet only a single agent controls the state transitions beyond stochastic teams. The key idea is to approximate the dynamics with a fictional scenario where the Q-function estimates are stationary over epochs whose lengths grow at a sufficiently slow rate. We then couple the dynamics in the main and fictional scenarios to show that these two scenarios become more and more similar across epochs due to the vanishing step size and growing epoch lengths.
GTApr 18
Controlling Traffic without Tolls: A Non-Monetary Framework for Autonomous IntersectionsArda Kosay, Yusuf Saltan, Jyun-Jhe Wang et al.
The increasing complexity of urban transportation systems, driven by connected and automated vehicles, calls for new modeling paradigms and scalable control strategies. We propose a non-monetary control framework that leverages autonomous intersection management to influence routing decisions without tolls. The approach uses timestamp-based scheduling adjustments at roadside units (RSUs) to introduce path-dependent delays or advancements, steering traffic toward socially efficient flows. We develop a hierarchical architecture that separates real-time intersection control from network-level coordination. The resulting model admits a congestion-game formulation with path-dependent node costs. We establish the existence and essential uniqueness of equilibrium flows, eliminating ambiguities due to multiple equilibria and enabling a scalable and tractable bilevel optimization formulation for system-level incentive design. Experiments on the Sioux Falls network show that the proposed approach reduces the efficiency gap between user equilibrium and system-optimal flows by up to 71% under realistic constraints. These results demonstrate the potential of non-monetary, infrastructure-light control for next-generation intelligent transportation and urban mobility systems.
GTMar 13, 2024
Strategizing against Q-learners: A Control-theoretical ApproachYuksel Arslantas, Ege Yuceel, Muhammed O. Sayin
In this paper, we explore the susceptibility of the independent Q-learning algorithms (a classical and widely used multi-agent reinforcement learning method) to strategic manipulation of sophisticated opponents in normal-form games played repeatedly. We quantify how much strategically sophisticated agents can exploit naive Q-learners if they know the opponents' Q-learning algorithm. To this end, we formulate the strategic actors' interactions as a stochastic game (whose state encompasses Q-function estimates of the Q-learners) as if the Q-learning algorithms are the underlying dynamical system. We also present a quantization-based approximation scheme to tackle the continuum state space and analyze its performance for two competing strategic actors and a single strategic actor both analytically and numerically.
GTNov 23, 2021
Independent Learning in Stochastic GamesAsuman Ozdaglar, Muhammed O. Sayin, Kaiqing Zhang
Reinforcement learning (RL) has recently achieved tremendous successes in many artificial intelligence applications. Many of the forefront applications of RL involve multiple agents, e.g., playing chess and Go games, autonomous driving, and robotics. Unfortunately, the framework upon which classical RL builds is inappropriate for multi-agent learning, as it assumes an agent's environment is stationary and does not take into account the adaptivity of other agents. In this review paper, we present the model of stochastic games for multi-agent learning in dynamic environments. We focus on the development of simple and independent learning dynamics for stochastic games: each agent is myopic and chooses best-response type actions to other agents' strategy without any coordination with her opponent. There has been limited progress on developing convergent best-response type independent learning dynamics for stochastic games. We present our recently proposed simple and independent learning dynamics that guarantee convergence in zero-sum stochastic games, together with a review of other contemporaneous algorithms for dynamic multi-agent learning in this setting. Along the way, we also reexamine some classical results from both the game theory and RL literature, to situate both the conceptual contributions of our independent learning dynamics, and the mathematical novelties of our analysis. We hope this review paper serves as an impetus for the resurgence of studying independent and natural learning dynamics in game theory, for the more challenging settings with a dynamic environment.
GTJun 4, 2021
Decentralized Q-Learning in Zero-sum Markov GamesMuhammed O. Sayin, Kaiqing Zhang, David S. Leslie et al.
We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized controller, but only based on their own payoffs and local actions executed. The agents need not observe the opponent's actions or payoffs, possibly being even oblivious to the presence of the opponent, nor be aware of the zero-sum structure of the underlying game, a setting also referred to as radically uncoupled in the literature of learning in games. In this paper, we develop a radically uncoupled Q-learning dynamics that is both rational and convergent: the learning dynamics converges to the best response to the opponent's strategy when the opponent follows an asymptotically stationary strategy; when both agents adopt the learning dynamics, they converge to the Nash equilibrium of the game. The key challenge in this decentralized setting is the non-stationarity of the environment from an agent's perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts her policies simultaneously and independently. To address this issue, we develop a two-timescale learning dynamics where each agent updates her local Q-function and value function estimates concurrently, with the latter happening at a slower timescale.
GTOct 8, 2020
Fictitious play in zero-sum stochastic gamesMuhammed O. Sayin, Francesca Parise, Asuman Ozdaglar
We present a novel variant of fictitious play dynamics combining classical fictitious play with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response by using the estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.
CRFeb 4, 2019
Deception-As-Defense Framework for Cyber-Physical SystemsMuhammed O. Sayin, Tamer Basar
We introduce deceptive signaling framework as a new defense measure against advanced adversaries in cyber-physical systems. In general, adversaries look for system-related information, e.g., the underlying state of the system, in order to learn the system dynamics and to receive useful feedback regarding the success/failure of their actions so as to carry out their malicious task. To this end, we craft the information that is accessible to adversaries strategically in order to control their actions in a way that will benefit the system, indirectly and without any explicit enforcement. Under the solution concept of game-theoretic hierarchical equilibrium, we arrive at a semi-definite programming problem equivalent to the infinite-dimensional optimization problem faced by the defender while selecting the best strategy when the information of interest is Gaussian and both sides have quadratic cost functions. The equivalence result holds also for the scenarios where the defender can have partial or noisy measurements or the objective of the adversary is not known. We show the optimality of linear signaling rule within the general class of measurable policies in communication scenarios and also compute the optimal linear signaling rule in control scenarios.
LGJan 30, 2019
Reliable Smart Road SignsMuhammed O. Sayin, Chung-Wei Lin, Eunsuk Kang et al.
In this paper, we propose a game theoretical adversarial intervention detection mechanism for reliable smart road signs. A future trend in intelligent transportation systems is ``smart road signs" that incorporate smart codes (e.g., visible at infrared) on their surface to provide more detailed information to smart vehicles. Such smart codes make road sign classification problem aligned with communication settings more than conventional classification. This enables us to integrate well-established results in communication theory, e.g., error-correction methods, into road sign classification problem. Recently, vision-based road sign classification algorithms have been shown to be vulnerable against (even) small scale adversarial interventions that are imperceptible for humans. On the other hand, smart codes constructed via error-correction methods can lead to robustness against small scale intelligent or random perturbations on them. In the recognition of smart road signs, however, humans are out of the loop since they cannot see or interpret them. Therefore, there is no equivalent concept of imperceptible perturbations in order to achieve a comparable performance with humans. Robustness against small scale perturbations would not be sufficient since the attacker can attack more aggressively without such a constraint. Under a game theoretical solution concept, we seek to ensure certain measure of guarantees against even the worst case (intelligent) attackers that can perturb the signal even at large scale. We provide a randomized detection strategy based on the distance between the decoder output and the received input, i.e., error rate. Finally, we examine the performance of the proposed scheme over various scenarios.
SYJan 30, 2019
Persuasion-based Robust Sensor Design Against Attackers with Unknown Control ObjectivesMuhammed O. Sayin, Tamer Basar
In this paper, we introduce a robust sensor design framework to provide "persuasion-based" defense in stochastic control systems against an unknown type attacker with a control objective exclusive to its type. For effective control, such an attacker's actions depend on its belief on the underlying state of the system. We design a robust "linear-plus-noise" signaling strategy to encode sensor outputs in order to shape the attacker's belief in a strategic way and correspondingly to persuade the attacker to take actions that lead to minimum damage with respect to the system's objective. The specific model we adopt is a Gauss-Markov process driven by a controller with a (partially) "unknown" malicious/benign control objective. We seek to defend against the worst possible distribution over control objectives in a robust way under the solution concept of Stackelberg equilibrium, where the sensor is the leader. We show that a necessary and sufficient condition on the covariance matrix of the posterior belief is a certain linear matrix inequality and we provide a closed-form solution for the associated signaling strategy. This enables us to formulate an equivalent tractable problem, indeed a semi-definite program, to compute the robust sensor design strategies "globally" even though the original optimization problem is non-convex and highly nonlinear. We also extend this result to scenarios where the sensor makes noisy or partial measurements. Finally, we analyze the ensuing performance numerically for various scenarios.
AIFeb 22, 2018
Reliable Intersection Control in Non-cooperative EnvironmentsMuhammed O. Sayin, Chung-Wei Lin, Shinichi Shiraishi et al.
We propose a reliable intersection control mechanism for strategic autonomous and connected vehicles (agents) in non-cooperative environments. Each agent has access to his/her earliest possible and desired passing times, and reports a passing time to the intersection manager, who allocates the intersection temporally to the agents in a First-Come-First-Serve basis. However, the agents might have conflicting interests and can take actions strategically. To this end, we analyze the strategic behaviors of the agents and formulate Nash equilibria for all possible scenarios. Furthermore, among all Nash equilibria we identify a socially optimal equilibrium that leads to a fair intersection allocation, and correspondingly we describe a strategy-proof intersection mechanism, which achieves reliable intersection control such that the strategic agents do not have any incentive to misreport their passing times strategically.
SYOct 3, 2016
Team-Optimal Distributed MMSE Estimation in General and Tree NetworksMuhammed O. Sayin, Suleyman S. Kozat, Tamer Başar
We construct team-optimal estimation algorithms over distributed networks for state estimation in the finite-horizon mean-square error (MSE) sense. Here, we have a distributed collection of agents with processing and cooperation capabilities. These agents observe noisy samples of a desired state through a linear model and seek to learn this state by interacting with each other. Although this problem has attracted significant attention and been studied extensively in fields including machine learning and signal processing, all the well-known strategies do not achieve team-optimal learning performance in the finite-horizon MSE sense. To this end, we formulate the finite-horizon distributed minimum MSE (MMSE) when there is no restriction on the size of the disclosed information, i.e., oracle performance, over an arbitrary network topology. Subsequently, we show that exchange of local estimates is sufficient to achieve the oracle performance only over certain network topologies. By inspecting these network structures, we propose recursive algorithms achieving the oracle performance through the disclosure of local estimates. For practical implementations we also provide approaches to reduce the complexity of the algorithms through the time-windowing of the observations. Finally, in the numerical examples, we demonstrate the superior performance of the introduced algorithms in the finite-horizon MSE sense due to optimal estimation.
LGJan 23, 2014
Predicting Nearly As Well As the Optimal Twice Differentiable RegressorN. Denizcan Vanli, Muhammed O. Sayin, Suleyman S. Kozat
We study nonlinear regression of real valued data in an individual sequence manner, where we provide results that are guaranteed to hold without any statistical assumptions. We address the convergence and undertraining issues of conventional nonlinear regression methods and introduce an algorithm that elegantly mitigates these issues via an incremental hierarchical structure, (i.e., via an incremental decision tree). Particularly, we present a piecewise linear (or nonlinear) regression algorithm that partitions the regressor space in a data driven manner and learns a linear model at each region. Unlike the conventional approaches, our algorithm gradually increases the number of disjoint partitions on the regressor space in a sequential manner according to the observed data. Through this data driven approach, our algorithm sequentially and asymptotically achieves the performance of the optimal twice differentiable regression function for any data sequence with an unknown and arbitrary length. The computational complexity of the introduced algorithm is only logarithmic in the data length under certain regularity conditions. We provide the explicit description of the algorithm and demonstrate the significant gains for the well-known benchmark real data sets and chaotic signals.
LGNov 26, 2013
A Novel Family of Adaptive Filtering Algorithms Based on The Logarithmic CostMuhammed O. Sayin, N. Denizcan Vanli, Suleyman S. Kozat
We introduce a novel family of adaptive filtering algorithms based on a relative logarithmic cost. The new family intrinsically combines the higher and lower order measures of the error into a single continuous update based on the error amount. We introduce important members of this family of algorithms such as the least mean logarithmic square (LMLS) and least logarithmic absolute difference (LLAD) algorithms that improve the convergence performance of the conventional algorithms. However, our approach and analysis are generic such that they cover other well-known cost functions as described in the paper. The LMLS algorithm achieves comparable convergence performance with the least mean fourth (LMF) algorithm and extends the stability bound on the step size. The LLAD and least mean square (LMS) algorithms demonstrate similar convergence performance in impulse-free noise environments while the LLAD algorithm is robust against impulsive interferences and outperforms the sign algorithm (SA). We analyze the transient, steady state and tracking performance of the introduced algorithms and demonstrate the match of the theoretical analyzes and simulation results. We show the extended stability bound of the LMLS algorithm and analyze the robustness of the LLAD algorithm against impulsive interferences. Finally, we demonstrate the performance of our algorithms in different scenarios through numerical examples.