CLAug 3, 2023

Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models

Zheyu Zhang, Han Yang, Bolei Ma, David Rügamer, Ercong Nie

arXiv:2308.01684v221.7136 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of improving reasoning in small-scale models for natural language understanding, though it is incremental as it builds on existing methods like Chain of Thought prompting.

The paper tackles the problem of training compact language models efficiently by proposing a CoThought pipeline that restructures a small dataset using GPT-3.5-turbo to create task-oriented texts, resulting in a BabyLM that outperforms vanilla RoBERTa by over 3 points on 10 tasks across 4 benchmarks.

Large Language Models (LLMs) demonstrate remarkable performance on a variety of natural language understanding (NLU) tasks, primarily due to their in-context learning ability. This ability could be applied to building babylike models, i.e. models at small scales, improving training efficiency. In this paper, we propose a "CoThought" pipeline, which efficiently trains smaller "baby" language models (BabyLMs) by leveraging the Chain of Thought prompting of LLMs. Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that are comparable to the school texts for language learners. The BabyLM is then pretrained on this restructured dataset in a RoBERTa fashion. In evaluations across 4 benchmarks, our BabyLM outperforms the vanilla RoBERTa in 10 linguistic, NLU, and question-answering tasks by more than 3 points, showing a superior ability to extract contextual information. These results suggest that compact LMs pretrained on small, LLM-restructured data can better understand tasks and achieve improved performance.

View on arXiv PDF Code

Similar