Rui Wang

CL
h-index7
5papers
2,105citations
Novelty56%
AI Score39

5 Papers

18.2LGFeb 29, 2024
Beyond Language Models: Byte Models are Digital World Simulators

Shangda Wu, Xu Tan, Zili Wang et al.

Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format. Inspired by the success of next token prediction in natural language processing, we introduce bGPT, a model with next byte prediction to simulate the digital world. bGPT matches specialized models in performance across various modalities, including text, audio, and images, and offers new possibilities for predicting, simulating, and diagnosing algorithm or hardware behaviour. It has almost flawlessly replicated the process of converting symbolic music data, achieving a low error rate of 0.0011 bits per byte in converting ABC notation to MIDI format. In addition, bGPT demonstrates exceptional capabilities in simulating CPU behaviour, with an accuracy exceeding 99.99% in executing various operations. Leveraging next byte prediction, models like bGPT can directly learn from vast binary data, effectively simulating the intricate patterns of the digital world.

3.4CLMar 26, 2024Code
A Large Language Model Guided Topic Refinement Mechanism for Short Text Modeling

Shuyu Chang, Rui Wang, Peng Ren et al.

Modeling topics effectively in short texts, such as tweets and news snippets, is crucial to capturing rapidly evolving social trends. Existing topic models often struggle to accurately capture the underlying semantic patterns of short texts, primarily due to the sparse nature of such data. This nature of texts leads to an unavoidable lack of co-occurrence information, which hinders the coherence and granularity of mined topics. This paper introduces a novel model-agnostic mechanism, termed Topic Refinement, which leverages the advanced text comprehension capabilities of Large Language Models (LLMs) for short-text topic modeling. Unlike traditional methods, this post-processing mechanism enhances the quality of topics extracted by various topic modeling methods through prompt engineering. We guide LLMs in identifying semantically intruder words within the extracted topics and suggesting coherent alternatives to replace these words. This process mimics human-like identification, evaluation, and refinement of the extracted topics. Extensive experiments on four diverse datasets demonstrate that Topic Refinement boosts the topic quality and improves the performance in topic-related text classification tasks.

31.4LGMar 19, 2024Code
AffineQuant: Affine Transformation Quantization for Large Language Models

Yuexiao Ma, Huixia Li, Xiawu Zheng et al.

The significant resource requirements associated with Large-scale Language Models (LLMs) have generated considerable interest in the development of techniques aimed at compressing and accelerating neural networks. Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the context of training. Existing PTQ methods for LLMs limit the optimization scope to scaling transformations between pre- and post-quantization weights. In this paper, we advocate for the direct optimization using equivalent Affine transformations in PTQ (AffineQuant). This approach extends the optimization scope and thus significantly minimizing quantization errors. Additionally, by employing the corresponding inverse matrix, we can ensure equivalence between the pre- and post-quantization outputs of PTQ, thereby maintaining its efficiency and generalization capabilities. To ensure the invertibility of the transformation during optimization, we further introduce a gradual mask optimization method. This method initially focuses on optimizing the diagonal elements and gradually extends to the other elements. Such an approach aligns with the Levy-Desplanques theorem, theoretically ensuring invertibility of the transformation. As a result, significant performance improvements are evident across different LLMs on diverse datasets. To illustrate, we attain a C4 perplexity of 15.76 (2.26 lower vs 18.02 in OmniQuant) on the LLaMA2-7B model of W4A4 quantization without overhead. On zero-shot tasks, AffineQuant achieves an average of 58.61 accuracy (1.98 lower vs 56.63 in OmniQuant) when using 4/4-bit quantization for LLaMA-30B, which setting a new state-of-the-art benchmark for PTQ in LLMs.

31.1CLSep 29, 2020
Neural Topic Modeling by Incorporating Document Relationship Graph

Deyu Zhou, Xuemeng Hu, Rui Wang

Graph Neural Networks (GNNs) that capture the relationships between graph nodes via message passing have been a hot research direction in the natural language processing community. In this paper, we propose Graph Topic Model (GTM), a GNN based neural topic model that represents a corpus as a document relationship graph. Documents and words in the corpus become nodes in the graph and are connected based on document-word co-occurrences. By introducing the graph structure, the relationships between documents are established through their shared words and thus the topical representation of a document is enriched by aggregating information from its neighboring nodes using graph convolution. Extensive experiments on three datasets were conducted and the results demonstrate the effectiveness of the proposed approach.

30.3CLAug 25, 2019
Open Event Extraction from Online Text using a Generative Adversarial Network

Rui Wang, Deyu Zhou, Yulan He

To extract the structured representations of open-domain events, Bayesian graphical models have made some progress. However, these approaches typically assume that all words in a document are generated from a single event. While this may be true for short text such as tweets, such an assumption does not generally hold for long text such as news articles. Moreover, Bayesian graphical models often rely on Gibbs sampling for parameter inference which may take long time to converge. To address these limitations, we propose an event extraction model based on Generative Adversarial Nets, called Adversarial-neural Event Model (AEM). AEM models an event with a Dirichlet prior and uses a generator network to capture the patterns underlying latent events. A discriminator is used to distinguish documents reconstructed from the latent events and the original documents. A byproduct of the discriminator is that the features generated by the learned discriminator network allow the visualization of the extracted events. Our model has been evaluated on two Twitter datasets and a news article dataset. Experimental results show that our model outperforms the baseline approaches on all the datasets, with more significant improvements observed on the news article dataset where an increase of 15\% is observed in F-measure.