Yankai Liu

CLJul 26, 2025Code

JT-Math: A Multi-Stage Framework for Advanced Mathematical Reasoning in Large Language Models

Yifan Hao, Fangning Chao, Yaqian Hao et al.

Mathematical reasoning is a cornerstone of artificial general intelligence and a primary benchmark for evaluating the capabilities of Large Language Models (LLMs). While state-of-the-art models show promise, they often falter when faced with complex problems that demand deep conceptual understanding and intricate, multi-step deliberation. To address this challenge, we introduce JT-Math-8B, a series of open-source models comprising base, instruct, and thinking versions, built upon a systematic, multi-stage optimization framework. Our pre-training corpus is a high-quality, 210B-token dataset curated through a dedicated data pipeline that uses model-based validation to ensure quality and diversity. The Instruct Model is optimized for direct, concise answers through Supervised Fine-Tuning (SFT) and a GRPO-based reinforcement learning (RL) method. The Thinking Model is trained for complex problem-solving using a Long Chain-of-Thought (Long CoT) approach, combining SFT with a novel, multi-stage RL curriculum that progressively increases task difficulty and context length up to 32K tokens. JT-Math-8B achieves state-of-the-art results among open-source models of similar size, surpassing prominent models like OpenAI's O1-mini and GPT-4o , and demonstrating superior performance on competition-level mathematics.

MMJun 30, 2017

Evaluation of No Reference Bitstream-based Video Quality Assessment Methods

Tiantian He, Yankai Liu, Rong Xie et al.

Many different parametric models for video quality assessment have been proposed in the past few years. This paper presents a review of nine recent models which cover a wide range of methodologies and have been validated for estimating video quality due to different degradation factors. Each model is briefly described with key algorithms and relevant parametric formulas. The generalization capability of each model to estimate video quality in real-application scenarios is evaluated and compared with other models, using a dataset created with video sequences from practical applications. These video sequences cover a wide range of possible realistic encoding parameters, labeled with mean opinion scores (MOS) via subjective test. The weakness and strength of each model are remarked. Finally, future work towards a more general parametric model that could apply for a wider range of applications is discussed.

Yankai Liu

2 Papers