Qing Tong

h-index10
2papers

2 Papers

LGSep 12, 2024
DiReDi: Distillation and Reverse Distillation for AIoT Applications

Chen Sun, Qing Tong, Wenshuang Yang et al.

Typically, the significant efficiency can be achieved by deploying different edge AI models in various real world scenarios while a few large models manage those edge AI models remotely from cloud servers. However, customizing edge AI models for each user's specific application or extending current models to new application scenarios remains a challenge. Inappropriate local training or fine tuning of edge AI models by users can lead to model malfunction, potentially resulting in legal issues for the manufacturer. To address aforementioned issues, this paper proposes an innovative framework called "DiReD", which involves knowledge DIstillation & REverse DIstillation. In the initial step, an edge AI model is trained with presumed data and a KD process using the cloud AI model in the upper management cloud server. This edge AI model is then dispatched to edge AI devices solely for inference in the user's application scenario. When the user needs to update the edge AI model to better fit the actual scenario, the reverse distillation (RD) process is employed to extract the knowledge: the difference between user preferences and the manufacturer's presumptions from the edge AI model using the user's exclusive data. Only the extracted knowledge is reported back to the upper management cloud server to update the cloud AI model, thus protecting user privacy by not using any exclusive data. The updated cloud AI can then update the edge AI model with the extended knowledge. Simulation results demonstrate that the proposed "DiReDi" framework allows the manufacturer to update the user model by learning new knowledge from the user's actual scenario with private data. The initial redundant knowledge is reduced since the retraining emphasizes user private data.

CLJan 23, 2025
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

Zui Chen, Tianqiao Liu, Mi Tian et al.

Mathematical reasoning remains a challenging area for large language models (LLMs), prompting the development of math-specific LLMs such as LLEMMA, DeepSeekMath, and Qwen2-Math, among others. These models typically follow a two-stage training paradigm: pre-training with math-related corpora and post-training with problem datasets for supervised fine-tuning (SFT). Despite these efforts, the improvements in mathematical reasoning achieved through continued pre-training (CPT) are often less significant compared to those obtained via SFT. This study addresses this discrepancy by exploring alternative strategies during the pre-training phase, focusing on the use of problem-solving data over general mathematical corpora. We investigate three primary research questions: (1) Can problem-solving data enhance the model's mathematical reasoning capabilities more effectively than general mathematical corpora during CPT? (2) Are synthetic data from the same source equally effective, and which synthesis methods are most efficient? (3) How do the capabilities developed from the same problem-solving data differ between the CPT and SFT stages, and what factors contribute to these differences? Our findings indicate that problem-solving data significantly enhances the model's mathematical capabilities compared to general mathematical corpora. We also identify effective data synthesis methods, demonstrating that the tutorship amplification synthesis method achieves the best performance. Furthermore, while SFT facilitates instruction-following abilities, it underperforms compared to CPT with the same data, which can be partially attributed to its poor learning capacity for more challenging problem-solving data. These insights provide valuable guidance for optimizing the mathematical reasoning capabilities of LLMs, culminating in our development of a powerful mathematical base model called MathGPT-8B.