Haihui Pan

CL
4papers
7citations
Novelty57%
AI Score46

4 Papers

92.8CLMay 27
Quality-constrained Entropy Maximization Policy Optimization for LLM Diversity

Haihui Pan, Yuzhong Hong, Kaichen Zhang et al.

In many large language model (LLM) alignment applications, users expect not only high-quality outputs but also substantial diversity. However, existing methods often face a fundamental trade-off between these objectives: approaches that improve output quality tend to reduce diversity, while methods that increase diversity often do so at the expense of quality. In this work, we propose Quality-constrained Entropy Maximization Policy Optimization (QEMPO), a novel framework that enhances the diversity of LLM outputs while explicitly preserving output quality. QEMPO is grounded in a strong theoretical foundation: we derive a closed-form analytical solution that provably maximizes entropy-a principled measure of diversity-subject to a quality constraint, with guarantees on optimality under the defined objective. Leveraging this solution, QEMPO naturally supports both online and offline training settings. Empirical results demonstrate that QEMPO consistently improves output diversity without sacrificing quality, and in many cases yields gains in both dimensions compared to existing baselines, aligning with our theoretical guarantees.

61.4CLMay 27
FABSVer: Faster Training and Better Self-Verification for LLM Mathematical Reasoning

Haihui Pan, Junwei Bao, Hongfei Jiang et al.

While large language models have made significant progress in mathematical reasoning, they remain unreliable at judging the correctness of their own solutions. Existing approaches that equip models with self-verification typically treat solution generation and verification as two separate tasks, leading to substantially increased training time. In this paper, we propose FABSVer, which fuses these two tasks into a single generation pass, dramatically reducing training overhead while jointly optimizing both capabilities. We further identify a convergence bottleneck both theoretically and empirically: as training progresses, the reward reaches a plateau because the policy is constrained by a fixed reference model. To overcome this, we introduce Dynamic Reference Model Update (DRMU), which raises the reward ceiling and enables sustained reward growth. Extensive experiments on math benchmarks demonstrate that FABSVer achieves superior self-verification and reasoning performance across three model scales, while requiring only 51%--71% of the training time of existing methods. Analysis further reveals distinct learning phases in how models acquire self-verification, and that the gap between verify and answer rewards shrinks noticeably as model size increases.

CLJan 20, 2024Code
Orion-14B: Open-source Multilingual Large Language Models

Du Chen, Yi Huang, Xiaopu Li et al.

In this study, we introduce Orion-14B, a collection of multilingual large language models with 14 billion parameters. We utilize a data scheduling approach to train a foundational model on a diverse corpus of 2.5 trillion tokens, sourced from texts in English, Chinese, Japanese, Korean, and other languages. Additionally, we fine-tuned a series of models tailored for conversational applications and other specific use cases. Our evaluation results demonstrate that Orion-14B achieves state-of-the-art performance across a broad spectrum of tasks. We make the Orion-14B model family and its associated code publicly accessible https://github.com/OrionStarAI/Orion, aiming to inspire future research and practical applications in the field.

LGDec 8, 2019
Short-term Load Forecasting with Dense Average Network

Zhifang Liao, Haihui Pan, Qi Zeng et al.

As an important part of the power system, power load forecasting directly affects the national economy. The data shows that improving the load forecasting accuracy by 0.01% can save millions of dollars for the power industry. Therefore, improving the accuracy of power load forecasting has always been the pursuing goals for a power system. Based on this goal, this paper proposes a novel connection, the dense average connection, in which the outputs of all preceding layers are averaged as the input of the next layer in a feed-forward fashion. Based on dense average connection , we construct the dense average network for power load forecasting. The predictions of the proposed model for two public datasets are better than those of existing methods. On this basis, we use the ensemble method to further improve the accuracy of the model. To verify the reliability of the model predictions, the robustness is analyzed and verified by adding input disturbances. The experimental results show that the proposed model is effective and robust for power load forecasting.