CLAug 20, 2025

Scaled Signed Averaging Improves In-Context and Early Learning Benchmark Performance in Small Transformers

arXiv:2508.14685v2h-index: 4
Originality Highly original
AI Analysis

This addresses performance bottlenecks in small transformers for in-context and early learning, offering a novel method that is incremental but shows strong gains on specific tasks.

The paper tackled limitations of in-context learning in small transformers on semantic and linear tasks by identifying Softmax as a contributing factor and proposing scaled signed averaging (SSA) as an alternative, resulting in significant performance improvements on ICL tasks and outperforming Softmax-based models on early learning benchmarks and linguistic probing tasks in zero- and few-shot settings.

While Large Language models' abilities for in-context learning (ICL) have drawn much attention, we examine some of its limitations on semantic tasks involving quantifiers like "all" and "some", as well as on tasks with linear functions. We identify Softmax, the scoring function in attention mechanism, as a contributing factor to these limitations. We propose scaled signed averaging (SSA), a novel alternative to Softmax to mitigate these problems. We show that SSA significantly improves performance on our ICL tasks. In addition, SSA outperforms transformer models with Softmax on several early learning NLP benchmarks and linguistic probing tasks on zero and few-shot settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes