Mahsa Derakhshan

DS
4papers
12citations
Novelty57%
AI Score46

4 Papers

37.5ITMar 17
Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Marios Aristodemou, Yasaman Omid, Sangarapillai Lambotharan et al.

The integration of satellite communication networks with next-generation (NG) technologies is a promising approach towards global connectivity. However, the quality of services is highly dependant on the availability of accurate channel state information (CSI). Channel estimation in satellite communications is challenging due to the high propagation delay between terrestrial users and satellites, which results in outdated CSI observations on the satellite side. In this paper, we study the downlink transmission of multiple satellites acting as distributed base stations (BS) to mobile terrestrial users. We propose a multi-agent reinforcement learning (MARL) algorithm which aims for maximising the sum-rate of the users, while coping with the outdated CSI. We design a novel bi-level optimisation, procedure themes as dual stage proximal policy optimisation (DS-PPO), for tackling the problem of large continuous action spaces as well as of independent and non-identically distributed (non-IID) environments in MARL. Specifically, the first stage of DS-PPO maximises the sum-rate for an individual satellite and the second stage maximises the sum-rate when all the satellites cooperate to form a distributed multi-antenna BS. Our numerical results demonstrate the robustness of DS-PPO to CSI imperfections as well as the sum-rate improvement attached by the use of DS-PPO. In addition, we provide the convergence analysis for the DS-PPO along with the computational complexity.

19.3DSMar 13
Approximation Algorithms for Action-Reward Query-Commit Matching

Mahsa Derakhshan, Andisheh Ghasemi, Calum MacRury

Matching problems under uncertainty arise in applications such as kidney exchange, hiring, and online marketplaces. A decision-maker must sequentially explore potential matches under local exploration constraints, while committing irrevocably to successful matches as they are revealed. The query-commit matching problem captures these challenges by modeling edges that succeed independently with known probabilities and must be accepted upon success, subject to vertex patience (time-out) constraints limiting the number of incident queries. In this work, we introduce the action-reward query-commit matching problem, a strict generalization of query-commit matching in which each query selects an action from a known action space, determining both the success probability and the reward of the queried edge. If an edge is queried using a chosen action and succeeds, it is irrevocably added to the matching, and the corresponding reward is obtained; otherwise, the edge is permanently discarded. We study the design of approximation algorithms for this problem on bipartite graphs. This model captures a broad class of stochastic matching problems, including the sequential pricing problem introduced by Pollner, Roghani, Saberi, and Wajc (EC~2022). On the positive side, Pollner et al. designed a polynomial-time approximation algorithm achieving a ratio of $0.426$ in the one-sided patience setting, which degrades to $0.395$ when both sides have bounded patience. In this work, we design computationally efficient algorithms for the action-reward query-commit in one-sided and two-sided patience settings, achieving approximation ratios of $1-1/e \approx 0.63$ and $\frac{1}{27}\!\left(19-67/e^3\right) \approx 0.58$ respectively. These results improve the state of the art for the sequential pricing problem, surpassing the previous guarantees of $0.426$ and $0.395$.

90.2DSApr 1
A Unified Framework for Analysis of Randomized Greedy Matching Algorithms

Mahsa Derakhshan, Tao Yu

Randomized greedy algorithms form one of the simplest yet most effective approaches for computing approximate matchings in graphs. In this paper, we focus on the class of vertex-iterative (VI) randomized greedy matching algorithms, which process the vertices of a graph $G=(V,E)$ in some order $π$ and, for each vertex $v$, greedily match it to the first available neighbor according to a preference order $σ(v)$. Various VI algorithms have been studied, each corresponding to a different distribution over $π$ and $σ(v)$. We develop a unified framework for analyzing this family of algorithms and use it to obtain improved approximation ratios for Ranking and FRanking, the state-of-the-art randomized greedy algorithms for the random-order and adversarial-order settings, respectively. In Ranking, the decision order is drawn uniformly at random and used as the common preference order, whereas FRanking uses an adversarial decision order and a uniformly random preference order shared by all vertices. We obtain an approximation ratio of $0.560$ for Ranking, improving on the $0.5469$ bound of Derakhshan et al. [SODA 2026]. For FRanking, we obtain a ratio of $0.539$, improving on the $0.521$ bound of Huang et al. [JACM 2020]. These results also imply state-of-the-art approximation ratios for oblivious matching and fully online matching problems on general graphs. Our analysis framework also enables us to prove improved approximation ratios for graphs with no short odd cycles. Such graphs form an intermediate class between general graphs and bipartite graphs. In particular, we show that Ranking is at least $0.570$-competitive for graphs that are both triangle-free and pentagon-free. For graphs whose shortest odd cycle has length at least $129$, we prove that Ranking is at least $0.615$-competitive.

GTMay 27, 2023
Learning and Collusion in Multi-unit Auctions

Simina Brânzei, Mahsa Derakhshan, Negin Golrezaei et al.

We consider repeated multi-unit auctions with uniform pricing, which are widely used in practice for allocating goods such as carbon licenses. In each round, $K$ identical units of a good are sold to a group of buyers that have valuations with diminishing marginal returns. The buyers submit bids for the units, and then a price $p$ is set per unit so that all the units are sold. We consider two variants of the auction, where the price is set to the $K$-th highest bid and $(K+1)$-st highest bid, respectively. We analyze the properties of this auction in both the offline and online settings. In the offline setting, we consider the problem that one player $i$ is facing: given access to a data set that contains the bids submitted by competitors in past auctions, find a bid vector that maximizes player $i$'s cumulative utility on the data set. We design a polynomial time algorithm for this problem, by showing it is equivalent to finding a maximum-weight path on a carefully constructed directed acyclic graph. In the online setting, the players run learning algorithms to update their bids as they participate in the auction over time. Based on our offline algorithm, we design efficient online learning algorithms for bidding. The algorithms have sublinear regret, under both full information and bandit feedback structures. We complement our online learning algorithms with regret lower bounds. Finally, we analyze the quality of the equilibria in the worst case through the lens of the core solution concept in the game among the bidders. We show that the $(K+1)$-st price format is susceptible to collusion among the bidders; meanwhile, the $K$-th price format does not have this issue.