Exploring Pseudo-Token Approaches in Transformer Neural Processes
This work addresses a computational bottleneck in meta-learning for real-world applications, offering a tunable balance between performance and efficiency, though it is incremental as it builds on existing pseudo-token approaches.
The paper tackles the quadratic computational complexity of Transformer Neural Processes (TNPs) by introducing Induced Set Attentive Neural Processes (ISANPs), which use pseudo-tokens to reduce computational demands while achieving competitive performance with TNPs and often surpassing state-of-the-art models in tasks like 1D regression and image completion.
Neural Processes (NPs) have gained attention in meta-learning for their ability to quantify uncertainty, together with their rapid prediction and adaptability. However, traditional NPs are prone to underfitting. Transformer Neural Processes (TNPs) significantly outperform existing NPs, yet their applicability in real-world scenarios is hindered by their quadratic computational complexity relative to both context and target data points. To address this, pseudo-token-based TNPs (PT-TNPs) have emerged as a novel NPs subset that condense context data into latent vectors or pseudo-tokens, reducing computational demands. We introduce the Induced Set Attentive Neural Processes (ISANPs), employing Induced Set Attention and an innovative query phase to improve querying efficiency. Our evaluations show that ISANPs perform competitively with TNPs and often surpass state-of-the-art models in 1D regression, image completion, contextual bandits, and Bayesian optimization. Crucially, ISANPs offer a tunable balance between performance and computational complexity, which scale well to larger datasets where TNPs face limitations.