T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground
This work addresses the need for efficient and practical Russian-language LLM applications, though it is incremental as it builds on existing methods like EAGLE speculative-decoding.
The authors tackled the problem of efficient hybrid reasoning for Russian language tasks by introducing T-pro 2.0, an open-weight LLM with an adapted speculative-decoding pipeline to reduce latency, and released model weights, a 500k instruction corpus, a reasoning benchmark, and a public demo to enable accessible research and applications.
We introduce T-pro 2.0, an open-weight Russian LLM for hybrid reasoning and efficient inference. The model supports direct answering and reasoning-trace generation, using a Cyrillic-dense tokenizer and an adapted EAGLE speculative-decoding pipeline to reduce latency. To enable reproducible and extensible research, we release the model weights, the T-Wix 500k instruction corpus, the T-Math reasoning benchmark, and the EAGLE weights on Hugging Face. These resources allow users to study Russian-language reasoning and to extend or adapt both the model and the inference pipeline. A public web demo exposes reasoning and non-reasoning modes and illustrates the speedups achieved by our inference stack across domains. T-pro 2.0 thus serves as an accessible open system for building and evaluating efficient, practical Russian LLM applications.