AIMay 17, 2017

Identification and Off-Policy Learning of Multiple Objectives Using Adaptive Clustering

arXiv:1705.06342v117 citations
Originality Incremental advance
AI Analysis

This could benefit scenarios where objectives are unknown or exploration is costly, but it appears incremental as it builds on existing clustering and Q-learning methods.

The paper tackles the problem of enabling an agent to autonomously identify and learn multiple objectives in an environment without prior knowledge, using an adaptive clustering algorithm and off-policy Q-learning, resulting in efficient knowledge accumulation without additional exploration in simulated tests.

In this work, we present a methodology that enables an agent to make efficient use of its exploratory actions by autonomously identifying possible objectives in its environment and learning them in parallel. The identification of objectives is achieved using an online and unsupervised adaptive clustering algorithm. The identified objectives are learned (at least partially) in parallel using Q-learning. Using a simulated agent and environment, it is shown that the converged or partially converged value function weights resulting from off-policy learning can be used to accumulate knowledge about multiple objectives without any additional exploration. We claim that the proposed approach could be useful in scenarios where the objectives are initially unknown or in real world scenarios where exploration is typically a time and energy intensive process. The implications and possible extensions of this work are also briefly discussed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes