LGOct 11, 2024

DeepOSets: Non-Autoregressive In-Context Learning with Permutation-Invariance Inductive Bias

arXiv:2410.09298v4h-index: 38
Originality Incremental advance
AI Analysis

This work addresses the need for efficient ICL models in machine learning, though it is incremental as it builds on existing set and operator learning architectures.

The paper tackled the problem of in-context learning (ICL) by proposing DeepOSets, a non-autoregressive neural architecture with permutation-invariance inductive bias, and demonstrated that it achieves accurate and fast results with an order of magnitude fewer parameters than transformer-based alternatives.

In-context learning (ICL) is the remarkable ability displayed by some machine learning models to learn from examples provided in a user prompt without any model parameter updates. ICL was first observed in the domain of large language models, and it has been widely assumed that it is a product of the attention mechanism in autoregressive transformers. In this paper, using stylized regression learning tasks, we demonstrate that ICL can emerge in a non-autoregressive neural architecture with a hard-coded permutation-invariance inductive bias. This novel architecture, called DeepOSets, combines the set learning properties of the DeepSets architecture with the operator learning capabilities of Deep Operator Networks (DeepONets). We provide a representation theorem for permutation-invariant regression learning operators and prove that DeepOSets are universal approximators of this class of operators. We performed comprehensive numerical experiments to evaluate the capabilities of DeepOSets in learning linear, polynomial, and shallow neural network regression, under varying noise levels, dimensionalities, and sample sizes. In the high-dimensional regime, accuracy was enhanced by replacing the DeepSets layer with a Set Transformer. Our results show that DeepOSets deliver accurate and fast results with an order of magnitude fewer parameters than a comparable transformer-based alternative.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes