Formal Aspects of Language Modeling
It offers foundational knowledge for developers and researchers to grasp LLM theory, but is incremental as it compiles existing material for educational purposes.
The paper addresses the need for understanding the mathematical foundations of large language models (LLMs) by providing formal theoretical notes, as their widespread deployment in NLP tools has significantly boosted performance and public discourse.
Large language models have become one of the most commonly deployed NLP inventions. In the past half-decade, their integration into core natural language processing tools has dramatically increased the performance of such tools, and they have entered the public discourse surrounding artificial intelligence. Consequently, it is important for both developers and researchers alike to understand the mathematical foundations of large language models, as well as how to implement them. These notes are the accompaniment to the theoretical portion of the ETH Zürich course on large language models, covering what constitutes a language model from a formal, theoretical perspective.