LGAINov 12, 2018

Agent Embeddings: A Latent Representation for Pole-Balancing Networks

arXiv:1811.04516v48 citations
Originality Synthesis-oriented
AI Analysis

This provides a method for mapping solution spaces in reinforcement learning, but it is incremental as it applies existing embedding concepts to a specific domain.

The paper tackled the problem of representing neural network agents as low-dimensional vectors, called agent embeddings, by learning a generative model over weight spaces, and demonstrated this on a Cart-Pole task, showing that interpolating between embeddings yields networks with tunable performance and extrapolation can boost performance.

We show that it is possible to reduce a high-dimensional object like a neural network agent into a low-dimensional vector representation with semantic meaning that we call agent embeddings, akin to word or face embeddings. This can be done by collecting examples of existing networks, vectorizing their weights, and then learning a generative model over the weight space in a supervised fashion. We investigate a pole-balancing task, Cart-Pole, as a case study and show that multiple new pole-balancing networks can be generated from their agent embeddings without direct access to training data from the Cart-Pole simulator. In general, the learned embedding space is helpful for mapping out the space of solutions for a given task. We observe in the case of Cart-Pole the surprising finding that good agents make different decisions despite learning similar representations, whereas bad agents make similar (bad) decisions while learning dissimilar representations. Linearly interpolating between the latent embeddings for a good agent and a bad agent yields an agent embedding that generates a network with intermediate performance, where the performance can be tuned according to the coefficient of interpolation. Linear extrapolation in the latent space also results in performance boosts, up to a point.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes