FLFeb 15, 2013
Asynchronous Games over Tree ArchitecturesBlaise Genest, Hugo Gimbert, Anca Muscholl et al.
We consider the task of controlling in a distributed way a Zielonka asynchronous automaton. Every process of a controller has access to its causal past to determine the next set of actions it proposes to play. An action can be played only if every process controlling this action proposes to play it. We consider reachability objectives: every process should reach its set of final states. We show that this control problem is decidable for tree architectures, where every process can communicate with its parent, its children, and with the environment. The complexity of our algorithm is l-fold exponential with l being the height of the tree representing the architecture. We show that this is unavoidable by showing that even for three processes the problem is EXPTIME-complete, and that it is non-elementary in general.
94.7FLApr 14
Characterizing normality via automata and random matrix productsLaurent Bienvenu, Santiago Cifuentes, Hugo Gimbert
For a fixed alphabet A, an infinite sequence X is said to be normal if every word w over A appears in X with the same frequency as any other word of the same length. A classical result relates normality to finite automata as follows: a sequence X is normal if and only if all gambling strategies implementable with finite deterministic automata lose all their capital when trying to predict the next bit of X after seing the ones before. More precisely, Schnorr and Stimm (1972) proved that the capital goes exponentially fast to zero unless the automaton represents the gambler that never bets, in which case the capital remains constant. In this paper we show that an analogous result holds when considering probabilistic automata: a sequence X is normal if and only if for any gambling strategy implementable with probabilistic finite automaton it holds that the expected value of the capital of the gambler converges exponentially fast to a finite value when playing against X. To obtain this result, we show a more general statement related to the convergence of martingales given by finite sets of non-negative matrices {M a } a$\in$A . In particular, we show that X is normal if and only if ||vM X[1] . . . M X[n] || converges exponentially fast to a finite value for any non-negative starting vector v. Moreover, we distinguish three distinctive behaviours that this sequence can attain, and prove that the problem of recognizing, given a family of matrices, to which case it belongs, is decidable.
AIDec 16, 2024
Revelations: A Decidable Class of POMDPs with Omega-Regular ObjectivesMarius Belly, Nathanaël Fijalkow, Hugo Gimbert et al.
Partially observable Markov decision processes (POMDPs) form a prominent model for uncertainty in sequential decision making. We are interested in constructing algorithms with theoretical guarantees to determine whether the agent has a strategy ensuring a given specification with probability 1. This well-studied problem is known to be undecidable already for very simple omega-regular objectives, because of the difficulty of reasoning on uncertain events. We introduce a revelation mechanism which restricts information loss by requiring that almost surely the agent has eventually full information of the current state. Our main technical results are to construct exact algorithms for two classes of POMDPs called weakly and strongly revealing. Importantly, the decidable cases reduce to the analysis of a finite belief-support Markov decision process. This yields a conceptually simple and exact algorithm for a large class of POMDPs.
FLJul 7, 2017
Controlling a PopulationNathalie Bertrand, Miheer Dewaskar, Blaise Genest et al.
We introduce a new setting where a population of agents, each modelled by a finite-state system, are controlled uniformly: the controller applies the same action to every agent. The framework is largely inspired by the control of a biological system, namely a population of yeasts, where the controller may only change the environment common to all cells. We study a synchronisation problem for such populations: no matter how individual agents react to the actions of the controller , the controller aims at driving all agents synchronously to a target state. The agents are naturally represented by a non-deterministic finite state automaton (NFA), the same for every agent, and the whole system is encoded as a 2-player game. The first player (Controller) chooses actions, and the second player (Agents) resolves non-determinism for each agent. The game with m agents is called the m-population game. This gives rise to a parameterized control problem (where control refers to 2 player games), namely the population control problem: can Controller control the m-population game for all $m $\in$ N$ whatever Agents does? In this paper, we prove that the population control problem is decidable, and it is a EXPTIME-complete problem. As far as we know, this is one of the first results on parameterized control. Our algorithm, not based on cutoff techniques, produces winning strategies which are symbolic, that is, they do not need to count precisely how the population is spread between states. We also show that if there is no winning strategy, then there is a population size M such that Controller wins the m-population game if and only if $m $\le$ M$. Surprisingly, M can be doubly exponential in the number of states of the NFA, with tight upper and lower bounds.
AIDec 12, 2016
Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision ProcessesLudovic Hofer, Hugo Gimbert
This paper presents a new method to learn online policies in continuous state, continuous action, model-free Markov decision processes, with two properties that are crucial for practical applications. First, the policies are implementable with a very low computational cost: once the policy is computed, the action corresponding to a given state is obtained in logarithmic time with respect to the number of samples used. Second, our method is versatile: it does not rely on any a priori knowledge of the structure of optimal policies. We build upon the Fitted Q-iteration algorithm which represents the $Q$-value as the average of several regression trees. Our algorithm, the Fitted Policy Forest algorithm (FPF), computes a regression forest representing the Q-value and transforms it into a single tree representing the policy, while keeping control on the size of the policy using resampling and leaf merging. We introduce an adaptation of Multi-Resolution Exploration (MRE) which is particularly suited to FPF. We assess the performance of FPF on three classical benchmarks for reinforcement learning: the "Inverted Pendulum", the "Double Integrator" and "Car on the Hill" and show that FPF equals or outperforms other algorithms, although these algorithms rely on the use of particular representations of the policies, especially chosen in order to fit each of the three problems. Finally, we exhibit that the combination of FPF and MRE allows to find nearly optimal solutions in problems where $ε$-greedy approaches would fail.