Syntactic Learnability of Echo State Neural Language Models at Scale
This suggests that complex architectures like Transformers may not always be necessary for syntactic learning, potentially reducing computational costs for language modeling tasks.
The study investigated whether simpler neural architectures like Echo State Networks (ESN) can achieve language learning comparable to Transformers, finding that ESN with a large hidden state performed similarly or better in grammaticality judgment tasks when trained on about 100M words.
What is a neural model with minimum architectural complexity that exhibits reasonable language learning capability? To explore such a simple but sufficient neural language model, we revisit a basic reservoir computing (RC) model, Echo State Network (ESN), a restricted class of simple Recurrent Neural Networks. Our experiments showed that ESN with a large hidden state is comparable or superior to Transformer in grammaticality judgment tasks when trained with about 100M words, suggesting that architectures as complex as that of Transformer may not always be necessary for syntactic learning.