CL AIApr 8, 2024

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV

Harvard

arXiv:2404.05892v429.6173 citationsh-index: 48Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for efficient and expressive sequence models for tasks like language processing, though it appears incremental as it builds directly on the existing RWKV architecture.

The authors introduced Eagle (RWKV-5) and Finch (RWKV-6), sequence models that improve upon RWKV-4 with multi-headed matrix-valued states and dynamic recurrence to enhance expressivity while maintaining RNN-like inference efficiency, achieving competitive performance across various benchmarks with models ranging from 0.46 to 7.5 billion parameters.

We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer

View on arXiv PDF Code

Similar