Ming Ren

7papers

89citations

Novelty49%

AI Score27

Ranked #159,303 of 201,326 authors (top 79%)#27,673 in CL (top 85%)

7 Papers

CLMay 25, 2022

A Zipf's Law-based Text Generation Approach for Addressing Imbalance in Entity Extraction

Zhenhua Wang, Ming Ren, Dong Gao et al.

Entity extraction is critical in the intelligent advancement across diverse domains. Nevertheless, a challenge to its effectiveness arises from the data imbalance. This paper proposes a novel approach by viewing the issue through the quantitative information, recognizing that entities exhibit certain levels of commonality while others are scarce, which can be reflected in the quantifiable distribution of words. The Zipf's Law emerges as a well-suited adoption, and to transition from words to entities, words within the documents are classified as common and rare ones. Subsequently, sentences are classified into common and rare ones, and are further processed by text generation models accordingly. Rare entities within the generated sentences are then labeled using human-designed rules, serving as a supplement to the raw dataset, thereby mitigating the imbalance problem. The study presents a case of extracting entities from technical documents, and experimental results from two datasets prove the effectiveness of the proposed method. Furthermore, the significance of Zipf's law in driving the progress of AI is discussed, broadening the reach and coverage of Informetrics. This paper presents a successful demonstration of extending Informetrics to interface with AI through Zipf's Law.

CLSep 12, 2022

A new hazard event classification model via deep learning and multifractal

Zhenhua Wang, Bin Wang, Ming Ren et al.

Hazard and operability analysis (HAZOP) is the paradigm of industrial safety that can reveal the hazards of process from its node deviations, consequences, causes, measures and suggestions, and such hazards can be considered as hazard events (HaE). The classification research on HaE has much irreplaceable pragmatic values. In this paper, we present a novel deep learning model termed DLF through multifractal to explore HaE classification where the motivation is that HaE can be naturally regarded as a kind of time series. Specifically, first HaE is vectorized to get HaE time series by employing BERT. Then, a new multifractal analysis method termed HmF-DFA is proposed to win HaE fractal series by analyzing HaE time series. Finally, a new hierarchical gating neural network (HGNN) is designed to process HaE fractal series to accomplish the classification of HaE from three aspects: severity, possibility and risk. We take HAZOP reports of 18 processes as cases, and launch the experiments on this basis. Results demonstrate that compared with other classifiers, DLF classifier performs better under metrics of precision, recall and F1-score, especially for the severity aspect. Also, HmF-DFA and HGNN effectively promote HaE classification. Our HaE classification system can serve application incentives to experts, engineers, employees, and other enterprises. We hope our research can contribute added support to the daily practice in industrial safety.

CLSep 10, 2022

Yes, DLGM! A novel hierarchical model for hazard classification

Zhenhua Wang, Ming Ren, Dong Gao et al.

Hazards can be exposed by HAZOP as text information, and studying their classification is of great significance to the development of industrial informatics, which is conducive to safety early warning, decision support, policy evaluation, etc. However, there is no research on this important field at present. In this paper, we propose a novel model termed DLGM via deep learning for hazard classification. Specifically, first, we leverage BERT to vectorize the hazard and treat it as a type of time series (HTS). Secondly, we build a grey model FSGM(1, 1) to model it, and get the grey guidance in the sense of the structural parameters. Finally, we design a hierarchical-feature fusion neural network (HFFNN) to investigate the HTS with grey guidance (HTSGG) from three themes, where, HFFNN is a hierarchical structure with four types of modules: two feature encoders, a gating mechanism, and a deepening mechanism. We take 18 industrial processes as application cases and launch a series of experiments. The experimental results prove that DLGM has promising aptitudes for hazard classification and that FSGM(1, 1) and HFFNN are effective. We hope our research can contribute added value and support to the daily practice in industrial safety.

CLJun 29, 2024

LLM-Generated Natural Language Meets Scaling Laws: New Explorations and Data Augmentation Methods

Zhenhua Wang, Guang Xu, Ming Ren

With the ascent of large language models (LLM), natural language processing has witnessed enhancements, such as LLM-based data augmentation. Nonetheless, prior research harbors two primary concerns: firstly, a lack of contemplation regarding whether the natural language generated by LLM (LLMNL) truly aligns with human natural language (HNL), a critical foundational question; secondly, an oversight that augmented data is randomly generated by LLM, implying that not all data may possess equal training value, that could impede the performance of classifiers. To address these challenges, we introduce the scaling laws to intrinsically calculate LLMNL and HNL. Through extensive experiments, we reveal slight deviations (approximately 0.2 Mandelbrot exponent) from Mandelbrot's law in LLMNL, underscore a complexity advantage in HNL, and supplement an interpretive discussion on language style. This establishes a solid foundation for LLM's expansion. Further, we introduce a novel data augmentation method for few-shot text classification, termed ZGPTDA, which leverages fuzzy computing mechanisms driven by the conformity to scaling laws to make decisions about GPT-4 augmented data. Extensive experiments, conducted in real-world scenarios, confirms the effectiveness (improving F1 of Bert and RoBerta by 7-10%) and competitiveness (surpassing recent AugGPT and GENCO methods by about 2% accuracy on DeBerta) of ZGPTDA. In addition, we reveal some interesting insights, e.g., Hilberg's law and Taylor's law can impart more benefits to text classification, etc.

CLSep 1, 2023

Exploring the law of text geographic information

Zhenhua Wang, Daiyu Zhang, Ming Ren et al.

Textual geographic information is indispensable and heavily relied upon in practical applications. The absence of clear distribution poses challenges in effectively harnessing geographic information, thereby driving our quest for exploration. We contend that geographic information is influenced by human behavior, cognition, expression, and thought processes, and given our intuitive understanding of natural systems, we hypothesize its conformity to the Gamma distribution. Through rigorous experiments on a diverse range of 24 datasets encompassing different languages and types, we have substantiated this hypothesis, unearthing the underlying regularities governing the dimensions of quantity, length, and distance in geographic information. Furthermore, theoretical analyses and comparisons with Gaussian distributions and Zipf's law have refuted the contingency of these laws. Significantly, we have estimated the upper bounds of human utilization of geographic information, pointing towards the existence of uncharted territories. Also, we provide guidance in geographic information extraction. Hope we peer its true countenance uncovering the veil of geographic information.

CLSep 1, 2023

Will sentiment analysis need subculture? A new data augmentation approach

Zhenhua Wang, Simin He, Guang Xu et al.

Nowadays, the omnipresence of the Internet has fostered a subculture that congregates around the contemporary milieu. The subculture artfully articulates the intricacies of human feelings by ardently pursuing the allure of novelty, a fact that cannot be disregarded in the sentiment analysis. This paper aims to enrich data through the lens of subculture, to address the insufficient training data faced by sentiment analysis. To this end, a new approach of subculture-based data augmentation (SCDA) is proposed, which engenders enhanced texts for each training text by leveraging the creation of specific subcultural expression generators. The extensive experiments attest to the effectiveness and potential of SCDA. The results also shed light on the phenomenon that disparate subcultural expressions elicit varying degrees of sentiment stimulation. Moreover, an intriguing conjecture arises, suggesting the linear reversibility of certain subcultural expressions.

LGNov 27, 2021

A New Multifractal-based Deep Learning Model for Text Mining

Zhenhua Wang, Ming Ren, Dong Gao

In this world full of uncertainty, where the fabric of existence weaves patterns of complexity, multifractal emerges as beacons of insight, illuminating them. As we delve into the realm of text mining that underpins various natural language processing applications and powers a range of intelligent services, we recognize that behind the veil of text lies a manifestation of human thought and cognition, intricately intertwined with the complexities. Building upon the foundation of perceiving text as a complex system, this study embarks on a journey to unravel the hidden treasures within, armed with the proposed multifractal method that deciphers the multifractal attributes embedded within the text landscape. This endeavor culminates in the birth of our novel model, which also harnesses the power of the proposed activation function to facilitate nonlinear information transmission within its neural network architecture. The success on experiments anchored in real-world technical reports covering the extraction of technical term and classification of hazard events, stands as a testament to our endeavors. This research venture not only expands our understanding of text mining but also opens new horizons for knowledge discovery across various domains.