LokiLM: Technical Report
This work addresses the problem of developing efficient, smaller-scale language models for natural language reasoning, but it is incremental as it builds on existing methods like knowledge distillation and focuses on competitive performance within a specific parameter range.
The authors introduced LokiLM, a 1.4B parameter large language model trained on 500B tokens, which achieves state-of-the-art performance among models with 1.5B parameters or less in natural language reasoning tasks, but exhibits concerning hallucinations and poor scores on TruthfulQA, leading to no public release.
In this work, we introduce LokiLM, a 1.4B parameter large language model trained on 500B tokens. Our model performs strongly in natural language reasoning tasks and achieves state-of-the-art performance among models with 1.5B parameters or less. LokiLM is trained using multi-teacher knowledge distillation and high-quality training data to achieve benchmark results competitive with larger models trained on significantly more tokens. We support these findings by introducing steps to avoid benchmark contamination and overfitting throughout our development process. Despite its promising performance, LokiLM exhibits a concerning amount of hallucinations and scores poorly on the TruthfulQA benchmark, so we do not release the model publicly.