LG AINov 15, 2022

Latent Bottlenecked Attentive Neural Processes

Leo Feng, Hossein Hajimirsadeghi, Yoshua Bengio, Mohamed Osama Ahmed

arXiv:2211.08458v319.232 citationsh-index: 212Has Code

Originality Incremental advance

AI Analysis

This addresses scalability issues in meta-learning for researchers and practitioners, though it is incremental as it builds on existing NP variants.

The paper tackles the computational inefficiency of Transformer Neural Processes (TNPs), which have quadratic complexity with context size, by proposing Latent Bottlenecked Attentive Neural Processes (LBANPs) that achieve sub-quadratic complexity with performance competitive with state-of-the-art methods on tasks like meta-regression and image completion.

Neural Processes (NPs) are popular methods in meta-learning that can estimate predictive uncertainty on target datapoints by conditioning on a context dataset. Previous state-of-the-art method Transformer Neural Processes (TNPs) achieve strong performance but require quadratic computation with respect to the number of context datapoints, significantly limiting its scalability. Conversely, existing sub-quadratic NP variants perform significantly worse than that of TNPs. Tackling this issue, we propose Latent Bottlenecked Attentive Neural Processes (LBANPs), a new computationally efficient sub-quadratic NP variant, that has a querying computational complexity independent of the number of context datapoints. The model encodes the context dataset into a constant number of latent vectors on which self-attention is performed. When making predictions, the model retrieves higher-order information from the context dataset via multiple cross-attention mechanisms on the latent vectors. We empirically show that LBANPs achieve results competitive with the state-of-the-art on meta-regression, image completion, and contextual multi-armed bandits. We demonstrate that LBANPs can trade-off the computational cost and performance according to the number of latent vectors. Finally, we show LBANPs can scale beyond existing attention-based NP variants to larger dataset settings.

View on arXiv PDF Code

Similar