CLAIDec 14, 2023

TigerBot: An Open Multilingual Multitask LLM

arXiv:2312.08688v214 citationsh-index: 2Has Code
Originality Synthesis-oriented
AI Analysis

This provides an incremental improvement in open-source multilingual LLMs for researchers and developers, enhancing accessibility and real-world application.

The authors introduced the TigerBot family of large language models, achieving a 6% performance gain in English and 20% in Chinese over state-of-the-art open-source models like Llama-2, and leading in major benchmarks.

We release and introduce the TigerBot family of large language models (LLMs), consisting of base and chat models, sized from 7, 13, 70 and 180 billion parameters. We develop our models embarking from Llama-2 and BLOOM, and push the boundary further in data, training algorithm, infrastructure, and application tools. Our models yield meaningful performance gain over SOTA open-source models, e.g., Llama-2, specifically 6% gain in English and 20% gain in Chinese. TigerBot model family also achieves leading performance in major academic and industrial benchmarks and leaderboards. We believe that TigerBot represents just a snapshot of lightning-fast progression in LLM open-source community. Therefore, we are thrilled to give back by publicly releasing our models and reporting our approach behind, with additional emphases on building SOTA LLMs in a democratized way and making LLMs of use in real-world applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes