Iterated Piecewise Affine (IPA) Approximation for Language Modeling
This work addresses language modeling for NLP applications, but it appears incremental as it builds on existing methods like Transformers with a novel approximation approach.
The paper tackles the problem of language modeling by approximating generic functions using a first-order Taylor expansion enhanced with iteration and piecewise modeling, resulting in the IPA algorithm that outperforms Transformers by 1.5% in next token prediction for smaller sequence lengths.
In this work, we demonstrate the application of a first-order Taylor expansion to approximate a generic function $F: R^{n \times m} \to R^{n \times m}$ and utilize it in language modeling. To enhance the basic Taylor expansion, we introduce iteration and piecewise modeling, leading us to name the algorithm the Iterative Piecewise Affine (IPA) approximation. The final algorithm exhibits interesting resemblances to the Transformers decoder architecture. By comparing parameter arrangements in IPA and Transformers, we observe a strikingly similar performance, with IPA outperforming Transformers by 1.5\% in the next token prediction task with cross-entropy loss for smaller sequence lengths.