CLOct 31, 2023

Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building

arXiv:2310.20589v1133 citationsh-index: 21
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of data efficiency in language modeling for researchers, but it is incremental as it builds on existing StructFormer methods without achieving broad SOTA gains.

The paper tackled improving data-efficient language models by incorporating unsupervised hierarchical structure predictions into transformer architectures, specifically using StructFormer variants, and found promising improvements on some tasks but did not consistently outperform the RoBERTa baseline across all 39 tasks in the BabyLM Challenge.

In this paper, we describe our submission to the BabyLM Challenge 2023 shared task on data-efficient language model (LM) pretraining (Warstadt et al., 2023). We train transformer-based masked language models that incorporate unsupervised predictions about hierarchical sentence structure into the model architecture. Concretely, we use the Structformer architecture (Shen et al., 2021) and variants thereof. StructFormer models have been shown to perform well on unsupervised syntactic induction based on limited pretraining data, and to yield performance improvements over a vanilla transformer architecture (Shen et al., 2021). Evaluation of our models on 39 tasks provided by the BabyLM challenge shows promising improvements of models that integrate a hierarchical bias into the architecture at some particular tasks, even though they fail to consistently outperform the RoBERTa baseline model provided by the shared task organizers on all tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes