Sarah H. Q. Li

h-index3

6papers

3citations

Novelty45%

AI Score44

Ranked #71,432 of 201,326 authors (top 35%)#16,177 in LG (top 38%)

6 Papers

LGJul 15, 2022

Set-based value operators for non-stationary Markovian environments

Sarah H. Q. Li, Assalé Adjé, Pierre-Loïc Garoche et al.

This paper analyzes finite state Markov Decision Processes (MDPs) with uncertain parameters in compact sets and re-examines results from robust MDP via set-based fixed point theory. To this end, we generalize the Bellman and policy evaluation operators to contracting operators on the value function space and denote them as \emph{value operators}. We lift these value operators to act on \emph{sets} of value functions and denote them as \emph{set-based value operators}. We prove that the set-based value operators are \emph{contractions} in the space of compact value function sets. Leveraging insights from set theory, we generalize the rectangularity condition in classic robust MDP literature to a containment condition for all value operators, which is weaker and can be applied to a larger set of parameter-uncertain MDPs and contracting operators in dynamic programming. We prove that both the rectangularity condition and the containment condition sufficiently ensure that the set-based value operator's fixed point set contains its own extrema elements. For convex and compact sets of uncertain MDP parameters, we show equivalence between the classic robust value function and the supremum of the fixed point set of the set-based Bellman operator. Under dynamically changing MDP parameters in compact sets, we prove a set convergence result for value iteration, which otherwise may not converge to a single value function. Finally, we derive novel guarantees for probabilistic path-planning problems in planet exploration and stratospheric station-keeping.

LGMay 5, 2022

General sum stochastic games with networked information flows

Sarah H. Q. Li, Lillian J. Ratliff, Peeyush Kumar

Inspired by applications such as supply chain management, epidemics, and social networks, we formulate a stochastic game model that addresses three key features common across these domains: 1) network-structured player interactions, 2) pair-wise mixed cooperation and competition among players, and 3) limited global information toward individual decision-making. In combination, these features pose significant challenges for black box approaches taken by deep learning-based multi-agent reinforcement learning (MARL) algorithms and deserve more detailed analysis. We formulate a networked stochastic game with pair-wise general sum objectives and asymmetrical information structure, and empirically explore the effects of information availability on the outcomes of different MARL paradigms such as individual learning and centralized learning decentralized execution.

65.2SYApr 7

Distributionally Robust Tolls for Traffic Networks with Affine Latency Functions

Chih-Yuan Chiu, Sarah H. Q. Li, Bryce L. Ferguson

In network congestion games, system operators often utilize latency models, estimated from real-world traffic flow and travel time data, to design monetary incentives which steer equilibrium user behaviors towards lowering system-wide latency. This work studies the impact of latency model uncertainty when designing incentives in non-atomic network congestion games. Our approach leverages distributionally robust optimization (DRO), which captures data-driven uncertainty in latency models by considering worst-case distribution shifts. We prove that, under mild and practically relevant assumptions, the distributionally robust tolling problem in single origin-destination, affine-latency congestion games can be solved via convex programming. Numerical simulations illustrate that tolls designed to be distributionally robust against unknown disturbances can outperform tolls designed using fixed, nominal disturbance models in minimizing system-wide latency.

4.8SYApr 9

Multi-agent Reach-avoid MDP via Potential Games and Low-rank Policy Structure

Adam Casselman, Abraham P. Vinod, Sarah H. Q. Li

We optimize finite horizon multi-agent reach-avoid Markov decision process (MDP) via \emph{local feedback policies}. The global feedback policy solution yields global optimality but its communication complexity, memory usage and computation complexity scale exponentially with the number of agents. We mitigate this exponential dependency by restricting the solution space to local feedback policies and show that local feedback policies are rank-one factorizations of global feedback policies, which provides a principled approach to reducing communication complexity and memory usage. Additionally, by demonstrating that multi-agent reach-avoid MDPs over local feedback policies has a potential game structure, we show that iterative best response is a tractable multi-agent learning scheme with guaranteed convergence to deterministic Nash equilibrium, and derive each agent's best response via multiplicative dynamic program (DP) over the joint state space. Numerical simulations across different MDPs and agent sets show that the peak memory usage and offline computation complexity are significantly reduced while the approximation error to the optimal global reach-avoid objective is maintained.

13.2SYApr 8

When the Correct Model Fails: The Optimality of Stackelberg Equilibria with Follower Intention Updates

Cayetana Salinas-Rodriguez, Jonathan Rogers, Sarah H. Q. Li

We study a two-player dynamic Stackelberg game where the follower's intention is unknown to the leader. Classical formulations of the Stackelberg equilibrium (SE) assume that the follower's best response (BR) function is known to the leader. However, this is not always true in practice. We study a setting in which the leader receives updated beliefs about the follower BR before the end of the game, such that the update prompts the leader and subsequently the follower to re-optimize their strategies. We characterize the optimality guarantees of the SE solutions under this belief update for both open loop and feedback information structures. Interestingly, we prove that in general, assuming an incorrect follower's BR may lead to a lower leader cost over the entire game than knowing the true follower's BR. We support these results with numerical examples in a linear quadratic (LQ) Stackelberg game, and use Monte Carlo simulations to show that the instances of incorrect BR achieving lower leader costs are non-trivial in collision avoidance LQ Stackelberg games.

LGAug 7, 2025

A Markov Decision Process Framework for Early Maneuver Decisions in Satellite Collision Avoidance

Francesca Ferrara, Lander W. Schillinger Arana, Florian Dörfler et al.

This work presents a Markov decision process (MDP) framework to model decision-making for collision avoidance maneuver (CAM) and a reinforcement learning policy gradient (RL-PG) algorithm to train an autonomous guidance policy using historic CAM data. In addition to maintaining acceptable collision risks, this approach seeks to minimize the average fuel consumption of CAMs by making early maneuver decisions. We model CAM as a continuous state, discrete action and finite horizon MDP, where the critical decision is determining when to initiate the maneuver. The MDP model also incorporates analytical models for conjunction risk, propellant consumption, and transit orbit geometry. The Markov policy effectively trades-off maneuver delay-which improves the reliability of conjunction risk indicators-with propellant consumption-which increases with decreasing maneuver time. Using historical data of tracked conjunction events, we verify this framework and conduct an extensive ablation study on the hyper-parameters used within the MDP. On synthetic conjunction events, the trained policy significantly minimizes both the overall and average propellant consumption per CAM when compared to a conventional cut-off policy that initiates maneuvers 24 hours before the time of closest approach (TCA). On historical conjunction events, the trained policy consumes more propellant overall but reduces the average propellant consumption per CAM. For both historical and synthetic conjunction events, the trained policy achieves equal if not higher overall collision risk guarantees.