Ning Wu

h-index32

7papers

342citations

Novelty58%

AI Score37

Ranked #91,139 of 194,257 authors (top 47%)#17,111 in CL (top 56%)

7 Papers

41.7LGSep 2, 2024Code

ToolACE: Winning the Points of LLM Function Calling

Weiwen Liu, Xu Huang, Xingshan Zeng et al.

Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.

14.5CLMar 27, 2023

Large Language Models are Diverse Role-Players for Summarization Evaluation

Ning Wu, Ming Gong, Linjun Shou et al.

Text summarization has a wide range of applications in many scenarios. The evaluation of the quality of the generated text is a complex problem. A big challenge to language evaluation is that there is a clear divergence between existing metrics and human evaluation. A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal. Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions. In this paper, we propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects. First, we propose to model objective and subjective dimensions of generated text based on roleplayers prompting mechanism. Furthermore, we introduce a context-based prompting mechanism that is able to generate dynamic roleplayer profiles based on input context. Finally, we design a multi-roleplayer prompting technology based on batch prompting and integrate multiple outputs into the final evaluation results. Experimental results on three real datasets for summarization show that our model is highly competitive and has a very high consistency with human annotators.

2.0LGMar 21, 2023Code

Indeterminate Probability Theory

Tao Yang, Chuang Liu, Xiaofeng Ma et al.

Complex continuous or mixed joint distributions (e.g., P(Y | z_1, z_2, ..., z_N)) generally lack closed-form solutions, often necessitating approximations such as MCMC. This paper proposes Indeterminate Probability Theory (IPT), which makes the following contributions: (1) An observer-centered framework in which experimental outcomes are represented as distributions combining ground truth with observation error; (2) The introduction of three independence candidate axioms that enable a two-phase probabilistic inference framework; (3) The derivation of closed-form solutions for arbitrary complex joint distributions under this framework. Both the Indeterminate Probability Neural Network (IPNN) model and the non-neural multivariate time series forecasting application demonstrate IPT's effectiveness in modeling high-dimensional distributions, with successful validation up to 1000 dimensions. Importantly, IPT is consistent with classical probability theory and subsumes the frequentist equation in the limit of vanishing observation error.

6.8CLDec 7, 2023Code

Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers

Nuo Chen, Ning Wu, Shining Liang et al.

This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing different sizes, and vertically, assessing different layers. We unveil several key and uncommon findings based on the designed probing tasks: (1) Horizontally, enlarging model sizes almost could not automatically impart additional knowledge or computational prowess. Instead, it can enhance reasoning abilities, especially in math problem solving, and helps reduce hallucinations, but only beyond certain size thresholds; (2) In vertical analysis, the lower layers of LLaMA lack substantial arithmetic and factual knowledge, showcasing logical thinking, multilingual and recognitive abilities, with top layers housing most computational power and real-world knowledge.

1.4CVAug 5, 2022

Joint Attention-Driven Domain Fusion and Noise-Tolerant Learning for Multi-Source Domain Adaptation

Tong Xu, Lin Wang, Wu Ning et al.

As a study on the efficient usage of data, Multi-source Unsupervised Domain Adaptation transfers knowledge from multiple source domains with labeled data to an unlabeled target domain. However, the distribution discrepancy between different domains and the noisy pseudo-labels in the target domain both lead to performance bottlenecks of the Multi-source Unsupervised Domain Adaptation methods. In light of this, we propose an approach that integrates Attention-driven Domain fusion and Noise-Tolerant learning (ADNT) to address the two issues mentioned above. Firstly, we establish a contrary attention structure to perform message passing between features and to induce domain movement. Through this approach, the discriminability of the features can also be significantly improved while the domain discrepancy is reduced. Secondly, based on the characteristics of the unsupervised domain adaptation training, we design an Adaptive Reverse Cross Entropy loss, which can directly impose constraints on the generation of pseudo-labels. Finally, combining these two approaches, experimental results on several benchmarks further validate the effectiveness of our proposed ADNT and demonstrate superior performance over the state-of-the-art methods.

10.3STFeb 18, 2024

Ploutos: Towards interpretable stock movement prediction with financial large language model

Hanshuang Tong, Jun Li, Ning Wu et al.

Recent advancements in large language models (LLMs) have opened new pathways for many domains. However, the full potential of LLMs in financial investments remains largely untapped. There are two main challenges for typical deep learning-based methods for quantitative finance. First, they struggle to fuse textual and numerical information flexibly for stock movement prediction. Second, traditional methods lack clarity and interpretability, which impedes their application in scenarios where the justification for predictions is essential. To solve the above challenges, we propose Ploutos, a novel financial LLM framework that consists of PloutosGen and PloutosGPT. The PloutosGen contains multiple primary experts that can analyze different modal data, such as text and numbers, and provide quantitative strategies from different perspectives. Then PloutosGPT combines their insights and predictions and generates interpretable rationales. To generate accurate and faithful rationales, the training strategy of PloutosGPT leverage rearview-mirror prompting mechanism to guide GPT-4 to generate rationales, and a dynamic token weighting mechanism to finetune LLM by increasing key tokens weight. Extensive experiments show our framework outperforms the state-of-the-art methods on both prediction accuracy and interpretability.

12.2CLJun 20, 2024Code

Selected Languages are All You Need for Cross-lingual Truthfulness Transfer

Weihao Liu, Ning Wu, Wenbiao Ding et al.

Truthfulness stands out as an essential challenge for Large Language Models (LLMs). Although many works have developed various ways for truthfulness enhancement, they seldom focus on truthfulness in multilingual scenarios. Meanwhile, contemporary multilingual aligning technologies struggle to balance numerous languages and often exhibit serious truthfulness gaps across different languages, especially those that differ greatly from English. In our work, we extend truthfulness evaluation to multilingual contexts and propose a practical method for cross-lingual truthfulness transfer called Fact-aware Multilingual Selective Synergy (FaMSS). FaMSS is able to select an optimal subset of all tested languages by language bias and transfer contributions, and then employ translation instruction tuning for cross-lingual truthfulness transfer. Experimental results demonstrate that our approach can effectively reduce the multilingual representation disparity and boost cross-lingual truthfulness transfer of LLMs.