Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture
This addresses efficiency and effectiveness challenges in foundation model architectures for AI applications, presenting a competitive alternative to existing architectures.
The paper tackles the problem of making foundation models more efficient and effective by combining sequence and state transformations, achieving results such as reducing perplexity by over 4%, maintaining 100% accuracy in a challenging task with a 150% improvement, and speeding up expert retrieval by 8-10 times.
In order to make the foundation model more efficient and effective, our idea is combining sequence transformation and state transformation. First, we prove the availability of rotary position embedding in the state space duality algorithm, which reduces the perplexity of the hybrid quadratic causal self-attention and state space duality by more than 4%, to ensure that the combining sequence transformation unifies position encoding. Second, we propose dynamic mask attention, which maintains 100% accuracy in the more challenging multi-query associative recall task, improving by more than 150% compared to quadratic causal self-attention and state space duality, to ensure that the combining sequence transformation selectively filters relevant information. Third, we design cross domain mixture of experts, which makes the computational speed of expert retrieval with more than 1024 experts 8 to 10 times faster than the mixture of experts, to ensure that the combining state transformation quickly retrieval mixture. Finally, we summarize these matrix algorithms that can form the foundation model: Wonderful Matrices, which can be a competitor to popular model architectures.