Mixing Any Cocktail with Limited Ingredients: On the Structure of Payoff Sets in Multi-Objective POMDPs and its Impact on Randomised Strategies
This work addresses a theoretical limitation in multi-objective POMDPs for researchers in decision theory and AI, showing that randomisation is necessary and sufficient for payoff achievement, which is incremental but clarifies foundational aspects.
The paper tackles the problem of achieving arbitrary expected payoff vectors in multi-objective partially observable Markov decision processes (POMDPs) by analyzing the structure of payoff sets and the sufficiency of strategies. It proves that mixing finitely many pure strategies can approximate any expected payoff vector up to any precision, and under certain conditions, exact payoffs can be achieved with finite mixing.
We consider multi-dimensional payoff functions in partially observable Markov decision processes. We study the structure of the set of expected payoff vectors of all strategies (policies) and study what kind are needed to achieve a given expected payoff vector. In general, pure strategies (i.e., not resorting to randomisation) do not suffice for this problem. We prove that for any payoff for which the expectation is well-defined under all strategies, it is sufficient to mix (i.e., randomly select a pure strategy at the start of a play and committing to it for the rest of the play) finitely many pure strategies to approximate any expected payoff vector up to any precision. Furthermore, for any payoff for which the expected payoff is finite under all strategies, any expected payoff can be obtained exactly by mixing finitely many strategies.