Min Wen

h-index8

4papers

299citations

Novelty51%

AI Score28

Ranked #150,126 of 194,257 authors (top 77%)#9,064 in AI (top 72%)

4 Papers

1.2SYNov 2, 2015

Strategy Synthesis for Stochastic Rabin Games with Discounted Reward

Min Wen, Ufuk Topcu

Stochastic games are often used to model reactive processes. We consider the problem of synthesizing an optimal almost-sure winning strategy in a two-player (namely a system and its environment) turn-based stochastic game with both a qualitative objective as a Rabin winning condition, and a quantitative objective as a discounted reward. Optimality is considered only over the almost-sure winning strategies, i.e., system strategies that guarantee the satisfaction of the Rabin condition with probability 1 regardless of the environment's strategy. We show that optimal almost-sure winning strategies may need infinite memory, but epsilon-optimal almost-sure winning strategies can always be finite-memory or even memoryless. We identify a sufficient and necessary condition of the existence of memoryless epsilon-optimal almost-sure winning strategies and propose an algorithm to compute one when this condition is satisfied.

19.7LGJan 24, 2019Code

Algorithms for Fairness in Sequential Decision Making

Min Wen, Osbert Bastani, Ufuk Topcu

It has recently been shown that if feedback effects of decisions are ignored, then imposing fairness constraints such as demographic parity or equality of opportunity can actually exacerbate unfairness. We propose to address this challenge by modeling feedback effects as Markov decision processes (MDPs). First, we propose analogs of fairness properties for the MDP setting. Second, we propose algorithms for learning fair decision-making policies for MDPs. Finally, we demonstrate the need to account for dynamical effects using simulations on a loan applicant MDP.

27.2AIApr 14, 2017

Environment-Independent Task Specifications via GLTL

Michael L. Littman, Ufuk Topcu, Jie Fu et al.

We propose a new task-specification language for Markov decision processes that is designed to be an improvement over reward functions by being environment independent. The language is a variant of Linear Temporal Logic (LTL) that is extended to probabilistic specifications in a way that permits approximations to be learned in finite time. We provide several small environments that demonstrate the advantages of our geometric LTL (GLTL) language and illustrate how it can be used to specify standard reinforcement-learning tasks straightforwardly.

13.8LOMar 5, 2015

Correct-by-synthesis reinforcement learning with temporal logic constraints

Min Wen, Ruediger Ehlers, Ufuk Topcu

We consider a problem on the synthesis of reactive controllers that optimize some a priori unknown performance criterion while interacting with an uncontrolled environment such that the system satisfies a given temporal logic specification. We decouple the problem into two subproblems. First, we extract a (maximally) permissive strategy for the system, which encodes multiple (possibly all) ways in which the system can react to the adversarial environment and satisfy the specifications. Then, we quantify the a priori unknown performance criterion as a (still unknown) reward function and compute an optimal strategy for the system within the operating envelope allowed by the permissive strategy by using the so-called maximin-Q learning algorithm. We establish both correctness (with respect to the temporal logic specifications) and optimality (with respect to the a priori unknown performance criterion) of this two-step technique for a fragment of temporal logic specifications. For specifications beyond this fragment, correctness can still be preserved, but the learned strategy may be sub-optimal. We present an algorithm to the overall problem, and demonstrate its use and computational requirements on a set of robot motion planning examples.