Seyed Pooya Shariatpanahi

LG
h-index27
6papers
16citations
Novelty50%
AI Score41

6 Papers

LGMar 12
Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach

Erfan Mirzaei, Seyed Pooya Shariatpanahi, Alireza Tavakoli et al.

Personalized AI-based services involve a population of individual reinforcement learning agents. However, most reinforcement learning algorithms focus on harnessing individual learning and fail to leverage the social learning capabilities commonly exhibited by humans and animals. Social learning integrates individual experience with observing others' behavior, presenting opportunities for improved learning outcomes. In this study, we focus on a social bandit learning scenario where a social agent observes other agents' actions without knowledge of their rewards. The agents independently pursue their own policy without explicit motivation to teach each other. We propose a free energy-based social bandit learning algorithm over the policy space, where the social agent evaluates others' expertise levels without resorting to any oracle or social norms. Accordingly, the social agent integrates its direct experiences in the environment and others' estimated policies. The theoretical convergence of our algorithm to the optimal policy is proven. Empirical evaluations validate the superiority of our social learning method over alternative approaches in various scenarios. Our algorithm strategically identifies the relevant agents, even in the presence of random or suboptimal agents, and skillfully exploits their behavioral information. In addition to societies including expert agents, in the presence of relevant but non-expert agents, our algorithm significantly enhances individual learning performance, where most related methods fail. Importantly, it also maintains logarithmic regret.

CRMar 7
TopRank-Based Delivery Rate Optimization for Coded Caching under Non-Uniform Demands

Mohammadsaber Bahadori, Seyed Pooya Shariatpanahi, Behnam Bahrak

We study the problem of coded caching with nonuniform file popularity under the setting where the popularity distribution is initially unknown. By reframing the problem, we propose a method inspired by an algorithm from the recommender-systems literature and multi-armed bandits. Unlike prior approaches, which focus on accurately estimating file popularities, our method ranks files relative to one another and partitions them into groups. This perspective is more consistent with the structure of prior approaches as well, since earlier methods also divided files into popular and non-popular groups after estimating their popularities. The proposed approach relies on differences in request counts between files as the basis for ranking, and under many conditions it outperforms the previous algorithm. In particular, we obtain significantly improved performance in scenarios where the number of users in the network is small, the cache storage capacity is limited, or the learning process of the true popularity of files based on observations is contaminated by exploratory or synthetic requests that do not match the true popularity distribution. In these cases, our policy achieves markedly better performance and attains sublinear regret.

LGDec 21, 2024
Subgoal Discovery Using a Free Energy Paradigm and State Aggregations

Amirhossein Mesbah, Reshad Hosseini, Seyed Pooya Shariatpanahi et al.

Reinforcement learning (RL) plays a major role in solving complex sequential decision-making tasks. Hierarchical and goal-conditioned RL are promising methods for dealing with two major problems in RL, namely sample inefficiency and difficulties in reward shaping. These methods tackle the mentioned problems by decomposing a task into simpler subtasks and temporally abstracting a task in the action space. One of the key components for task decomposition of these methods is subgoal discovery. We can use the subgoal states to define hierarchies of actions and also use them in decomposing complex tasks. Under the assumption that subgoal states are more unpredictable, we propose a free energy paradigm to discover them. This is achieved by using free energy to select between two spaces, the main space and an aggregation space. The $model \; changes$ from neighboring states to a given state shows the unpredictability of a given state, and therefore it is used in this paper for subgoal discovery. Our empirical results on navigation tasks like grid-world environments show that our proposed method can be applied for subgoal discovery without prior knowledge of the task. Our proposed method is also robust to the stochasticity of environments.

NIJan 16, 2024
Generative AI for O-RAN Slicing: A Semi-Supervised Approach with VAE and Contrastive Learning

Salar Nouri, Mojdeh Karbalaee Motalleb, Vahid Shah-Mansouri et al.

This paper introduces a novel generative AI (GAI)-driven, unified semi-supervised learning architecture for optimizing resource allocation and network slicing in O-RAN. Termed Generative Semi-Supervised VAE-Contrastive Learning, our approach maximizes the weighted user equipment (UE) throughput and allocates physical resource blocks (PRBs) to enhance the quality of service for eMBB and URLLC services. The GAI framework utilizes a dedicated xApp for intelligent power control and PRB allocation. This integrated GAI model synergistically combines the generative power of a VAE with contrastive learning to achieve robustness in an end-to-end trainable system. It is a semi-supervised training approach that concurrently optimizes supervised regression of resource allocation decisions (i.e., power, UE association, PRB) and unsupervised contrastive objectives. This intrinsic fusion improves the precision of resource management and model generalization in dynamic mobile networks. We evaluated our GAI methodology against exhaustive search and deep Q-Network algorithms using key performance metrics. Results show our integrated GAI approach offers superior efficiency and effectiveness in various scenarios, presenting a compelling GAI-based solution for critical network slicing and resource management challenges in next-generation O-RAN systems.

LGDec 13, 2020
Reinforcement Learning with Subspaces using Free Energy Paradigm

Milad Ghorbani, Reshad Hosseini, Seyed Pooya Shariatpanahi et al.

In large-scale problems, standard reinforcement learning algorithms suffer from slow learning speed. In this paper, we follow the framework of using subspaces to tackle this problem. We propose a free-energy minimization framework for selecting the subspaces and integrate the policy of the state-space into the subspaces. Our proposed free-energy minimization framework rests upon Thompson sampling policy and behavioral policy of subspaces and the state-space. It is therefore applicable to a variety of tasks, discrete or continuous state space, model-free and model-based tasks. Through a set of experiments, we show that this general framework highly improves the learning speed. We also provide a convergence proof.

NIJan 10, 2020
Classification of Traffic Using Neural Networks by Rejecting: a Novel Approach in Classifying VPN Traffic

Ali Parchekani, Salar Nouri, Vahid Shah-Mansouri et al.

In this paper, we introduce a novel end-to-end traffic classification method to distinguish between traffic classes including VPN traffic in three layers of the Open Systems Interconnection (OSI) model. Classification of VPN traffic is not trivial using traditional classification approaches due to its encrypted nature. We utilize two well-known neural networks, namely multi-layer perceptron and recurrent neural network to create our cascade neural network focused on two metrics: class scores and distance from the center of the classes. Such approach combines extraction, selection, and classification functionality into a single end-to-end system to systematically learn the non-linear relationship between input and predicted performance. Therefore, we could distinguish VPN traffics from non-VPN traffics by rejecting the unrelated features of the VPN class. Moreover, we obtain the application type of non-VPN traffics at the same time. The approach is evaluated using the general traffic dataset ISCX VPN-nonVPN, and an acquired dataset. The results demonstrate the efficacy of the framework approach for encrypting traffic classification while also achieving extreme accuracy, $95$ percent, which is higher than the accuracy of the state-of-the-art models, and strong generalization capabilities.