LGApr 19, 2023
Graph Exploration for Effective Multi-agent Q-LearningAinur Zhaikhan, Ali H. Sayed
This paper proposes an exploration technique for multi-agent reinforcement learning (MARL) with graph-based communication among agents. We assume the individual rewards received by the agents are independent of the actions by the other agents, while their policies are coupled. In the proposed framework, neighbouring agents collaborate to estimate the uncertainty about the state-action space in order to execute more efficient explorative behaviour. Different from existing works, the proposed algorithm does not require counting mechanisms and can be applied to continuous-state environments without requiring complex conversion techniques. Moreover, the proposed scheme allows agents to communicate in a fully decentralized manner with minimal information exchange. And for continuous-state scenarios, each agent needs to exchange only a single parameter vector. The performance of the algorithm is verified with theoretical results for discrete-state scenarios and with experiments for continuous ones.
LGJul 6, 2024
Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable EnvironmentsAinur Zhaikhan, Ali H. Sayed
This study proposes the use of a social learning method to estimate a global state within a multi-agent off-policy actor-critic algorithm for reinforcement learning (RL) operating in a partially observable environment. We assume that the network of agents operates in a fully-decentralized manner, possessing the capability to exchange variables with their immediate neighbors. The proposed design methodology is supported by an analysis demonstrating that the difference between final outcomes, obtained when the global state is fully observed versus estimated through the social learning method, is $\varepsilon$-bounded when an appropriate number of iterations of social learning updates are implemented. Unlike many existing dec-POMDP-based RL approaches, the proposed algorithm is suitable for model-free multi-agent reinforcement learning as it does not require knowledge of a transition model. Furthermore, experimental results illustrate the efficacy of the algorithm and demonstrate its superiority over the current state-of-the-art methods.
CROct 24, 2020
Safeguarding the IoT from Malware Epidemics: A Percolation Theory ApproachAinur Zhaikhan, Mustafa A. Kishk, Hesham ElSawy et al.
The upcoming Internet of things (IoT) is foreseen to encompass massive numbers of connected devices, smart objects, and cyber-physical systems. Due to the large-scale and massive deployment of devices, it is deemed infeasible to safeguard 100% of the devices with state-of-the-art security countermeasures. Hence, large-scale IoT has inevitable loopholes for network intrusion and malware infiltration. Even worse, exploiting the high density of devices and direct wireless connectivity, malware infection can stealthily propagate through susceptible (i.e., unsecured) devices and form an epidemic outbreak without being noticed to security administration. A malware outbreak enables adversaries to compromise large population of devices, which can be exploited to launch versatile cyber and physical malicious attacks. In this context, we utilize spatial firewalls, to safeguard the IoT from malware outbreak. In particular, spatial firewalls are computationally capable devices equipped with state-of-the-art security and anti-malware programs that are spatially deployed across the network to filter the wireless traffic in order to detect and thwart malware propagation. Using tools from percolation theory, we prove that there exists a critical density of spatial firewalls beyond which malware outbreak is impossible. This, in turns, safeguards the IoT from malware epidemics regardless of the infection/treatment rates. To this end, a tractable upper bound for the critical density of spatial firewalls is obtained. Furthermore, we characterize the relative communications ranges of the spatial firewalls and IoT devices to ensure secure network connectivity. The percentage of devices secured by the firewalls is also characterized.