CLJan 9, 2024Code
TechGPT-2.0: A large language model project to solve the task of knowledge graph constructionJiaqi Wang, Yuying Chang, Zhong Li et al.
Large language models have exhibited robust performance across diverse natural language processing tasks. This report introduces TechGPT-2.0, a project designed to enhance the capabilities of large language models specifically in knowledge graph construction tasks, including named entity recognition (NER) and relationship triple extraction (RTE) tasks in NLP applications. Additionally, it serves as a LLM accessible for research within the Chinese open-source model community. We offer two 7B large language model weights and a QLoRA weight specialized for processing lengthy texts.Notably, TechGPT-2.0 is trained on Huawei's Ascend server. Inheriting all functionalities from TechGPT-1.0, it exhibits robust text processing capabilities, particularly in the domains of medicine and law. Furthermore, we introduce new capabilities to the model, enabling it to process texts in various domains such as geographical areas, transportation, organizations, literary works, biology, natural sciences, astronomical objects, and architecture. These enhancements also fortified the model's adeptness in handling hallucinations, unanswerable queries, and lengthy texts. This report provides a comprehensive and detailed introduction to the full fine-tuning process on Huawei's Ascend servers, encompassing experiences in Ascend server debugging, instruction fine-tuning data processing, and model training. Our code is available at https://github.com/neukg/TechGPT-2.0
AIMar 26, 2025
CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process SupervisionYifei Lu, Fanghua Ye, Jian Li et al.
Tool invocation significantly enhances the capabilities of Large Language Models (LLMs), yet challenges persist, particularly in complex task scenarios. Current methods, such as instruction-enhanced reasoning and supervised fine-tuning, often result in unnecessarily long reasoning paths and face difficulties in verifying the correctness of intermediate steps. In this paper, we propose CodeTool, a novel framework for stepwise code generation that improves LLM tool invocation by leveraging the concise and easily verifiable nature of code. CodeTool incorporates two distinct process rewards: the On-the-spot Reward, which provides immediate feedback on the accuracy of each tool invocation, and the Latent Reward, which assesses the contribution of each step toward overall task completion. By maximizing the cumulative reward of the On-the-spot and Latend Rewards at each step, LLMs are guided to follow efficient and accurate reasoning paths. Extensive experiments on StableToolBench and RestBench-TMDB demonstrate the superiority of CodeTool over existing approaches.
CLOct 19, 2024
TrendFact: A Benchmark for Explainable Hotspot Perception in Fact-Checking with Natural Language ExplanationXiaocheng Zhang, Xi Wang, Yifei Lu et al.
Fact-checking benchmarks provide standardized testing criteria for automated fact-checking systems, driving technological advancement. With the surge of misinformation on social media and the emergence of various fact-checking methods, public concern about the transparency of automated systems and the accuracy of fact-checking for high infulence events has grown. However, existing benchmarks fail to meet these urgent needs and are predominantly English-centric, hindering the progress of comprehensive fact-checking. To address these issues, we introduce TrendFact, the first benchmark capable of evaluating hotspot perception ability (HPA) and all fact-checking tasks. TrendFact consists of 7,643 curated samples sourced from trending platforms and professional fact-checking datasets, as well as an evidence library containing 366,634 entries with publication dates. Additionally, to complement existing benchmarks in evaluating system explanation consistency and HPA, we propose two new metrics: ECS and HCPI. Experimental results show that current fact-checking systems face significant limitations when evaluated on TrendFact, which facilitates the development of more robust fact-checking methods. Furthermore, to enhance the capabilities of existing advanced fact-checking systems, the reasoning large language models (RLMs), we propose FactISR, a reasoning framework that integrates dynamic evidence augmentation with influence score-based iterative self-reflection. FactISR effectively improves RLM's performance, offering new insights into explainable and complex fact-checking.
CLSep 16, 2025
ConvergeWriter: Data-Driven Bottom-Up Article ConstructionBinquan Ji, Jiaqi Wang, Ruiting Li et al.
Large Language Models (LLMs) have shown remarkable prowess in text generation, yet producing long-form, factual documents grounded in extensive external knowledge bases remains a significant challenge. Existing "top-down" methods, which first generate a hypothesis or outline and then retrieve evidence, often suffer from a disconnect between the model's plan and the available knowledge, leading to content fragmentation and factual inaccuracies. To address these limitations, we propose a novel "bottom-up," data-driven framework that inverts the conventional generation pipeline. Our approach is predicated on a "Retrieval-First for Knowledge, Clustering for Structure" strategy, which first establishes the "knowledge boundaries" of the source corpus before any generative planning occurs. Specifically, we perform exhaustive iterative retrieval from the knowledge base and then employ an unsupervised clustering algorithm to organize the retrieved documents into distinct "knowledge clusters." These clusters form an objective, data-driven foundation that directly guides the subsequent generation of a hierarchical outline and the final document content. This bottom-up process ensures that the generated text is strictly constrained by and fully traceable to the source material, proactively adapting to the finite scope of the knowledge base and fundamentally mitigating the risk of hallucination. Experimental results on both 14B and 32B parameter models demonstrate that our method achieves performance comparable to or exceeding state-of-the-art baselines, and is expected to demonstrate unique advantages in knowledge-constrained scenarios that demand high fidelity and structural coherence. Our work presents an effective paradigm for generating reliable, structured, long-form documents, paving the way for more robust LLM applications in high-stakes, knowledge-intensive domains.
CLJul 23, 2025
CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based RewardsCheng Liu, Yifei Lu, Fanghua Ye et al.
Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs). Existing approaches typically rely on prompt engineering or supervised fine-tuning to enable models to imitate character behaviors in specific scenarios, but often neglect the underlying \emph{cognitive} mechanisms driving these behaviors. Inspired by cognitive psychology, we introduce \textbf{CogDual}, a novel RPLA adopting a \textit{cognize-then-respond } reasoning paradigm. By jointly modeling external situational awareness and internal self-awareness, CogDual generates responses with improved character consistency and contextual alignment. To further optimize the performance, we employ reinforcement learning with two general-purpose reward schemes designed for open-domain text generation. Extensive experiments on the CoSER benchmark, as well as Cross-MR and LifeChoice, demonstrate that CogDual consistently outperforms existing baselines and generalizes effectively across diverse role-playing tasks.
CLJun 21, 2025
Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question AnsweringBinquan Ji, Haibo Luo, Yifei Lu et al.
Knowledge-intensive multi-hop question answering (QA) tasks, which require integrating evidence from multiple sources to address complex queries, often necessitate multiple rounds of retrieval and iterative generation by large language models (LLMs). However, incorporating many documents and extended contexts poses challenges -such as hallucinations and semantic drift-for lightweight LLMs with fewer parameters. This work proposes a novel framework called DEC (Dynamic Enhancement Chain). DEC first decomposes complex questions into logically coherent subquestions to form a hallucination-free reasoning chain. It then iteratively refines these subquestions through context-aware rewriting to generate effective query formulations. For retrieval, we introduce a lightweight discriminative keyword extraction module that leverages extracted keywords to achieve targeted, precise document recall with relatively low computational overhead. Extensive experiments on three multi-hop QA datasets demonstrate that DEC performs on par with or surpasses state-of-the-art benchmarks while significantly reducing token consumption. Notably, our approach attains state-of-the-art results on models with 8B parameters, showcasing its effectiveness in various scenarios, particularly in resource-constrained environments.