LGAIMLJul 15, 2020

The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

arXiv:2007.08620v29 citations
AI Analysis

This work addresses uncertainty estimation in sequence prediction for applications like natural language processing, offering a novel method that is incremental in its approach.

The paper tackles the problem of single-point predictions in transformers by introducing the Sequential Monte Carlo Transformer, which captures the observations distribution and provides predictive distributions, achieving improved uncertainty quantification in sequence prediction tasks.

This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture. The keys, queries, values and attention vectors of the network are considered as the unobserved stochastic states of its hidden structure. This generative model is such that at each time step the received observation is a random function of its past states in a given attention window. In this general state-space setting, we use Sequential Monte Carlo methods to approximate the posterior distributions of the states given the observations, and to estimate the gradient of the log-likelihood. We hence propose a generative model giving a predictive distribution, instead of a single-point estimate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes