LG AIJul 25, 2022

Modelling non-reinforced preferences using selective attention

Noor Sajid, Panagiotis Tigas, Zafeirios Fountas, Qinghai Guo, Alexey Zakharov, Lancelot Da Costa

arXiv:2207.13699v13.31 citationsh-index: 20

Originality Incremental advance

AI Analysis

This addresses the challenge of autonomous preference formation in AI agents, though it appears incremental as it builds on existing methods like world models and attention mechanisms.

The paper tackles the problem of enabling artificial agents to learn non-reinforced preferences for adapting to changing environments, proposing the Nore mechanism that uses selective attention and achieves exploratory preferences without external signals in a modified FrozenLake environment.

How can artificial agents learn non-reinforced preferences to continuously adapt their behaviour to a changing environment? We decompose this question into two challenges: ($i$) encoding diverse memories and ($ii$) selectively attending to these for preference formation. Our proposed \emph{no}n-\emph{re}inforced preference learning mechanism using selective attention, \textsc{Nore}, addresses both by leveraging the agent's world model to collect a diverse set of experiences which are interleaved with imagined roll-outs to encode memories. These memories are selectively attended to, using attention and gating blocks, to update agent's preferences. We validate \textsc{Nore} in a modified OpenAI Gym FrozenLake environment (without any external signal) with and without volatility under a fixed model of the environment -- and compare its behaviour to \textsc{Pepper}, a Hebbian preference learning mechanism. We demonstrate that \textsc{Nore} provides a straightforward framework to induce exploratory preferences in the absence of external signals.

View on arXiv PDF

Similar