Huizhong Li

LG
h-index14
4papers
85citations
Novelty59%
AI Score41

4 Papers

LGMar 7, 2025Code
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

Ling Team, Binwei Zeng, Chao Huang et al.

In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two differently sized MoE large language models (LLMs), namely Ling-Lite and Ling-Plus (referred to as "Bailing" in Chinese, spelled Bǎilíng in Pinyin). Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters. Both models exhibit comparable performance to leading industry benchmarks. This report offers actionable insights to improve the efficiency and accessibility of AI development in resource-constrained settings, promoting more scalable and sustainable technologies. Specifically, to reduce training costs for large-scale MoE models, we propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency. Additionally, leveraging high-quality data generated from knowledge graphs, our models demonstrate superior capabilities in tool use compared to other models. Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models. Compared to high-performance devices, utilizing a lower-specification hardware system during the pre-training phase demonstrates significant cost savings, reducing computing costs by approximately 20%. The models can be accessed at https://huggingface.co/inclusionAI.

CLOct 25, 2025
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

Ling Team, Ang Li, Ben Liu et al.

We introduce Ling 2.0, a series reasoning-oriented language foundation built upon the principle that every activation boosts reasoning capability. Designed to scale from tens of billions to one trillion parameters under a unified Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity, cross-scale consistency, and efficiency guided by empirical scaling laws. The series includes three non-thinking (instruct) models - Ling-mini-2.0, Ling-flash-2.0, and Ling-1T - ranging from 16B to 1T total parameters and achieving up to 7-fold active-compute efficiency compared with dense counterparts. Ling 2.0 integrates coordinated innovations across model architecture, pre-training, post-training, and infrastructure: a high-sparsity MoE with MTP for efficient reasoning, reasoning-oriented data and mid-training CoT activation, reinforcement-based fine-tuning (DFT, Evo-CoT), and full-scale FP8 training with fine-grained heterogeneous pipelines. At the trillion scale, Ling-1T establishes a new Pareto frontier of reasoning accuracy versus computational efficiency, demonstrating that sparse activation, when properly aligned with reasoning objectives, enables scalable and efficient intelligence. Collectively, Ling 2.0 provides a coherent, open, and efficient foundation for advancing future reasoning and thinking models, including the Ring series built upon the same base.

CRJul 22, 2021
Improving Blockchain Consistency Bound by Assigning Weights to Random Blocks

Qing Zhang, Xueping Gong, Huizhong Li et al.

Blockchains based on the celebrated Nakamoto consensus protocol have shown promise in several applications, including cryptocurrencies. However, these blockchains have inherent scalability limits caused by the protocol's consensus properties. In particular, the \emph{consistency} property demonstrates a tight trade-off between block production speed and the system's security in terms of resisting adversarial attacks. This paper proposes a novel method, Ironclad, that improves blockchain consistency bound by assigning a different weight to randomly selected blocks. We apply our method to the original Nakamoto protocol and rigorously prove that such a combination can improve the consistency bound significantly by analyzing the fundamental consensus properties. Such an improvement enables a much faster block production rate than the original Nakamoto protocol under the same security guarantee with the same proportion of malicious mining power (see Figure 1).

LGJun 8, 2021
FL-Market: Trading Private Models in Federated Learning

Shuyuan Zheng, Yang Cao, Masatoshi Yoshikawa et al.

The difficulty in acquiring a sufficient amount of training data is a major bottleneck for machine learning (ML) based data analytics. Recently, commoditizing ML models has been proposed as an economical and moderate solution to ML-oriented data acquisition. However, existing model marketplaces assume that the broker can access data owners' private training data, which may not be realistic in practice. In this paper, to promote trustworthy data acquisition for ML tasks, we propose FL-Market, a locally private model marketplace that protects privacy not only against model buyers but also against the untrusted broker. FL-Market decouples ML from the need to centrally gather training data on the broker's side using federated learning, an emerging privacy-preserving ML paradigm in which data owners collaboratively train an ML model by uploading local gradients (to be aggregated into a global gradient for model updating). Then, FL-Market enables data owners to locally perturb their gradients by local differential privacy and thus further prevents privacy risks. To drive FL-Market, we propose a deep learning-empowered auction mechanism for intelligently deciding the local gradients' perturbation levels and an optimal aggregation mechanism for aggregating the perturbed gradients. Our auction and aggregation mechanisms can jointly maximize the global gradient's accuracy, which optimizes model buyers' utility. Our experiments verify the effectiveness of the proposed mechanisms.