CLJul 15, 2024Code
MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language ModelsChengguang Gan, Sunbowen Lee, Qingyu Yin et al.
The Mutual Reinforcement Effect (MRE) represents a promising avenue in information extraction and multitasking research. Nevertheless, its applicability has been constrained due to the exclusive availability of MRE mix datasets in Japanese, thereby limiting comprehensive exploration by the global research community. To address this limitation, we introduce a Multilingual MRE mix dataset (MMM) that encompasses 21 sub-datasets in English, Japanese, and Chinese. In this paper, we also propose a method for dataset translation assisted by Large Language Models (LLMs), which significantly reduces the manual annotation time required for dataset construction by leveraging LLMs to translate the original Japanese datasets. Additionally, we have enriched the dataset by incorporating open-domain Named Entity Recognition (NER) and sentence classification tasks. Utilizing this expanded dataset, we developed a unified input-output framework to train an Open-domain Information Extraction Large Language Model (OIELLM). The OIELLM model demonstrates the capability to effectively process novel MMM datasets, exhibiting significant improvements in performance. The OIELLM model and datasets is open-source in HuggingFace: https://ganchengguang.github.io/MRE/
CLSep 20, 2022
A Few-shot Approach to Resume Information Extraction via PromptsChengguang Gan, Tatsunori Mori
Prompt learning's fine-tune performance on text classification tasks has attracted the NLP community. This paper applies it to resume information extraction, improving existing methods for this task. We created manual templates and verbalizers tailored to resume texts and compared the performance of Masked Language Model (MLM) and Seq2Seq PLMs. Also, we enhanced the verbalizer design for Knowledgeable Prompt-tuning, contributing to prompt template design across NLP tasks. We present the Manual Knowledgeable Verbalizer (MKV), a rule for constructing verbalizers for specific applications. Our tests show that MKV rules yield more effective, robust templates and verbalizers than existing methods. Our MKV approach resolved sample imbalance, surpassing current automatic prompt methods. This study underscores the value of tailored prompt learning for resume extraction, stressing the importance of custom-designed templates and verbalizers.
CLJun 28, 2023
Sentence-to-Label Generation Framework for Multi-task Learning of Japanese Sentence Classification and Named Entity RecognitionChengguang Gan, Qinghao Zhang, Tatsunori Mori
Information extraction(IE) is a crucial subfield within natural language processing. In this study, we introduce a Sentence Classification and Named Entity Recognition Multi-task (SCNM) approach that combines Sentence Classification (SC) and Named Entity Recognition (NER). We develop a Sentence-to-Label Generation (SLG) framework for SCNM and construct a Wikipedia dataset containing both SC and NER. Using a format converter, we unify input formats and employ a generative model to generate SC-labels, NER-labels, and associated text segments. We propose a Constraint Mechanism (CM) to improve generated format accuracy. Our results show SC accuracy increased by 1.13 points and NER by 1.06 points in SCNM compared to standalone tasks, with CM raising format accuracy from 63.61 to 100. The findings indicate mutual reinforcement effects between SC and NER, and integration enhances both tasks' performance.
CLAug 5, 2022
Construction of English Resume Corpus and Test with Pre-trained Language ModelsChengguang Gan, Tatsunori Mori
Information extraction(IE) has always been one of the essential tasks of NLP. Moreover, one of the most critical application scenarios of information extraction is the information extraction of resumes. Constructed text is obtained by classifying each part of the resume. It is convenient to store these texts for later search and analysis. Furthermore, the constructed resume data can also be used in the AI resume screening system. Significantly reduce the labor cost of HR. This study aims to transform the information extraction task of resumes into a simple sentence classification task. Based on the English resume dataset produced by the prior study. The classification rules are improved to create a larger and more fine-grained classification dataset of resumes. This corpus is also used to test some current mainstream Pre-training language models (PLMs) performance.Furthermore, in order to explore the relationship between the number of training samples and the correctness rate of the resume dataset, we also performed comparison experiments with training sets of different train set sizes.The final multiple experimental results show that the resume dataset with improved annotation rules and increased sample size of the dataset improves the accuracy of the original resume dataset.
CLNov 12, 2023
GIELLM: Japanese General Information Extraction Large Language Model Utilizing Mutual Reinforcement EffectChengguang Gan, Qinghao Zhang, Tatsunori Mori
Information Extraction (IE) stands as a cornerstone in natural language processing, traditionally segmented into distinct sub-tasks. The advent of Large Language Models (LLMs) heralds a paradigm shift, suggesting the feasibility of a singular model addressing multiple IE subtasks. In this vein, we introduce the General Information Extraction Large Language Model (GIELLM), which integrates text Classification, Sentiment Analysis, Named Entity Recognition, Relation Extraction, and Event Extraction using a uniform input-output schema. This innovation marks the first instance of a model simultaneously handling such a diverse array of IE subtasks. Notably, the GIELLM leverages the Mutual Reinforcement Effect (MRE), enhancing performance in integrated tasks compared to their isolated counterparts. Our experiments demonstrate State-of-the-Art (SOTA) results in five out of six Japanese mixed datasets, significantly surpassing GPT-3.5-Turbo. Further, an independent evaluation using the novel Text Classification Relation and Event Extraction(TCREE) dataset corroborates the synergistic advantages of MRE in text and word classification. This breakthrough paves the way for most IE subtasks to be subsumed under a singular LLM framework. Specialized fine-tune task-specific models are no longer needed.
CLSep 7, 2023
USA: Universal Sentiment Analysis Model & Construction of Japanese Sentiment Text Classification and Part of Speech DatasetChengguang Gan, Qinghao Zhang, Tatsunori Mori
Sentiment analysis is a pivotal task in the domain of natural language processing. It encompasses both text-level sentiment polarity classification and word-level Part of Speech(POS) sentiment polarity determination. Such analysis challenges models to understand text holistically while also extracting nuanced information. With the rise of Large Language Models(LLMs), new avenues for sentiment analysis have opened. This paper proposes enhancing performance by leveraging the Mutual Reinforcement Effect(MRE) between individual words and the overall text. It delves into how word polarity influences the overarching sentiment of a passage. To support our research, we annotated four novel Sentiment Text Classification and Part of Speech(SCPOS) datasets, building upon existing sentiment classification datasets. Furthermore, we developed a Universal Sentiment Analysis(USA) model, with a 7-billion parameter size. Experimental results revealed that our model surpassed the performance of gpt-3.5-turbo across all four datasets, underscoring the significance of MRE in sentiment analysis.
CLJul 18, 2023
Mutual Reinforcement Effects in Japanese Sentence Classification and Named Entity Recognition TasksChengguang Gan, Qinghao Zhang, Tatsunori Mori
Information extraction(IE) is a crucial subfield within natural language processing. However, for the traditionally segmented approach to sentence classification and Named Entity Recognition, the intricate interactions between these individual subtasks remain largely uninvestigated. In this study, we propose an integrative analysis, converging sentence classification with Named Entity Recognition, with the objective to unveil and comprehend the mutual reinforcement effect within these two information extraction subtasks. To achieve this, we introduce a Sentence Classification and Named Entity Recognition Multi-task (SCNM) approach that combines Sentence Classification (SC) and Named Entity Recognition (NER). We develop a Sentence-to-Label Generation (SLG) framework for SCNM and construct a Wikipedia dataset containing both SC and NER. Using a format converter, we unify input formats and employ a generative model to generate SC-labels, NER-labels, and associated text segments. We propose a Constraint Mechanism (CM) to improve generated format accuracy. Our results show SC accuracy increased by 1.13 points and NER by 1.06 points in SCNM compared to standalone tasks, with CM raising format accuracy from 63.61 to 100. The findings indicate mutual reinforcement effects between SC and NER, and integration enhances both tasks' performance. We additionally implemented the SLG framework on single SC task. It yielded superior accuracies compared to the baseline on two distinct Japanese SC datasets. Notably, in the experiment of few-shot learning, SLG framework shows much better performance than fine-tune method. These empirical findings contribute additional evidence to affirm the efficacy of the SLG framework.
CLFeb 2
GuideWeb: A Benchmark for Automatic In-App Guide Generation on Real-World Web UIsChengguang Gan, Yoshihiro Tsujii, Yunhao Liang et al.
Digital Adoption Platform (DAP) provide web-based overlays that deliver operation guidance and contextual hints to help users navigate complex websites. Although modern DAP tools enable non-experts to author such guidance, maintaining these guides remains labor-intensive because website layouts and functionalities evolve continuously, which requires repeated manual updates and re-annotation. In this work, we introduce \textbf{GuideWeb}, a new benchmark for automatic in-app guide generation on real-world web UIs. GuideWeb formulates the task as producing page-level guidance by selecting \textbf{guide target elements} grounded in the webpage and generating concise guide text aligned with user intent. We also propose a comprehensive evaluation suite that jointly measures the accuracy of guide target element selection and the quality of generated intents and guide texts. Experiments show that our proposed \textbf{GuideWeb Agent} achieves \textbf{30.79\%} accuracy in guide target element prediction, while obtaining BLEU scores of \textbf{44.94} for intent generation and \textbf{21.34} for guide-text generation. Existing baselines perform substantially worse, which highlights that automatic guide generation remains challenging and that further advances are necessary before such systems can be reliably deployed in real-world settings.
CLJun 9, 2024Code
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language ModelsZiqiang Liu, Feiteng Fang, Xi Feng et al.
The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74.8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https://huggingface.co/datasets/m-a-p/II-Bench.
CLDec 6, 2023
Think from Words(TFW): Initiating Human-Like Cognition in Large Language Models Through Think from Words for Japanese Text-level ClassificationChengguang Gan, Qinghao Zhang, Tatsunori Mori
The proliferation of Large Language Models (LLMs) has spurred extensive research into LLM-related Prompt investigations, such as Instruction Learning (IL), In-context Learning (ICL), and Chain-of-Thought (CoT). These approaches aim to improve LLMs' responses by enabling them to provide concise statements or examples for deeper contemplation when addressing questions. However, independent thinking by LLMs can introduce variability in their thought processes, leading to potential inaccuracies. In response, our study seeks to bridge the gap between LLM and human-like thinking processes, recognizing that text comprehension begins with understanding individual words. To tackle this challenge, we have expanded the CoT method to cater to a specific domain. Our approach, known as "Think from Words" (TFW), initiates the comprehension process at the word level and then extends it to encompass the entire text. We also propose "TFW with Extra word-level information" (TFW Extra), augmenting comprehension with additional word-level data. To assess our methods, we employ text classification on six Japanese datasets comprising text-level and word-level elements. Our findings not only validate the effectiveness of TFW but also shed light on the impact of various word-level information types on LLMs' text comprehension, offering insights into their potential to cause misinterpretations and errors in the overall comprehension of the final text.
CLApr 24, 2025
M-MRE: Extending the Mutual Reinforcement Effect to Multimodal Information ExtractionChengguang Gan, Zhixi Cai, Yanbin Wei et al.
Mutual Reinforcement Effect (MRE) is an emerging subfield at the intersection of information extraction and model interpretability. MRE aims to leverage the mutual understanding between tasks of different granularities, enhancing the performance of both coarse-grained and fine-grained tasks through joint modeling. While MRE has been explored and validated in the textual domain, its applicability to visual and multimodal domains remains unexplored. In this work, we extend MRE to the multimodal information extraction domain for the first time. Specifically, we introduce a new task: Multimodal Mutual Reinforcement Effect (M-MRE), and construct a corresponding dataset to support this task. To address the challenges posed by M-MRE, we further propose a Prompt Format Adapter (PFA) that is fully compatible with various Large Vision-Language Models (LVLMs). Experimental results demonstrate that MRE can also be observed in the M-MRE task, a multimodal text-image understanding scenario. This provides strong evidence that MRE facilitates mutual gains across three interrelated tasks, confirming its generalizability beyond the textual domain.
CLOct 13, 2024
Empirical Study of Mutual Reinforcement Effect and Application in Few-shot Text Classification Tasks via PromptChengguang Gan, Tatsunori Mori
The Mutual Reinforcement Effect (MRE) investigates the synergistic relationship between word-level and text-level classifications in text classification tasks. It posits that the performance of both classification levels can be mutually enhanced. However, this mechanism has not been adequately demonstrated or explained in prior research. To address this gap, we employ empirical experiment to observe and substantiate the MRE theory. Our experiments on 21 MRE mix datasets revealed the presence of MRE in the model and its impact. Specifically, we conducted compare experiments use fine-tune. The results of findings from comparison experiments corroborates the existence of MRE. Furthermore, we extended the application of MRE to prompt learning, utilizing word-level information as a verbalizer to bolster the model's prediction of text-level classification labels. In our final experiment, the F1-score significantly surpassed the baseline in 18 out of 21 MRE Mix datasets, further validating the notion that word-level information enhances the language model's comprehension of the text as a whole.
CLMar 5, 2024
Demonstrating Mutual Reinforcement Effect through Information FlowChengguang Gan, Xuzheng He, Qinghao Zhang et al.
The Mutual Reinforcement Effect (MRE) investigates the synergistic relationship between word-level and text-level classifications in text classification tasks. It posits that the performance of both classification levels can be mutually enhanced. However, this mechanism has not been adequately demonstrated or explained in prior research. To address this gap, we employ information flow analysis to observe and substantiate the MRE theory. Our experiments on six MRE hybrid datasets revealed the presence of MRE in the model and its impact. Additionally, we conducted fine-tuning experiments, whose results were consistent with those of the information flow experiments. The convergence of findings from both experiments corroborates the existence of MRE. Furthermore, we extended the application of MRE to prompt learning, utilizing word-level information as a verbalizer to bolster the model's prediction of text-level classification labels. In our final experiment, the F1-score significantly surpassed the baseline in five out of six datasets, further validating the notion that word-level information enhances the language model's comprehension of the text as a whole.
CLJan 16, 2024
Application of LLM Agents in Recruitment: A Novel Framework for Resume ScreeningChengguang Gan, Qinghao Zhang, Tatsunori Mori
The automation of resume screening is a crucial aspect of the recruitment process in organizations. Automated resume screening systems often encompass a range of natural language processing (NLP) tasks. This paper introduces a novel Large Language Models (LLMs) based agent framework for resume screening, aimed at enhancing efficiency and time management in recruitment processes. Our framework is distinct in its ability to efficiently summarize and grade each resume from a large dataset. Moreover, it utilizes LLM agents for decision-making. To evaluate our framework, we constructed a dataset from actual resumes and simulated a resume screening process. Subsequently, the outcomes of the simulation experiment were compared and subjected to detailed analysis. The results demonstrate that our automated resume screening framework is 11 times faster than traditional manual methods. Furthermore, by fine-tuning the LLMs, we observed a significant improvement in the F1 score, reaching 87.73\%, during the resume sentence classification phase. In the resume summarization and grading phase, our fine-tuned model surpassed the baseline performance of the GPT-3.5 model. Analysis of the decision-making efficacy of the LLM agents in the final offer stage further underscores the potential of LLM agents in transforming resume screening processes.
CLMay 15, 2023
Sensitivity and Robustness of Large Language Models to Prompt Template in Japanese Text Classification TasksChengguang Gan, Tatsunori Mori
Prompt engineering relevance research has seen a notable surge in recent years, primarily driven by advancements in pre-trained language models and large language models. However, a critical issue has been identified within this domain: the inadequate of sensitivity and robustness of these models towards Prompt Templates, particularly in lesser-studied languages such as Japanese. This paper explores this issue through a comprehensive evaluation of several representative Large Language Models (LLMs) and a widely-utilized pre-trained model(PLM). These models are scrutinized using a benchmark dataset in Japanese, with the aim to assess and analyze the performance of the current multilingual models in this context. Our experimental results reveal startling discrepancies. A simple modification in the sentence structure of the Prompt Template led to a drastic drop in the accuracy of GPT-4 from 49.21 to 25.44. This observation underscores the fact that even the highly performance GPT-4 model encounters significant stability issues when dealing with diverse Japanese prompt templates, rendering the consistency of the model's output results questionable. In light of these findings, we conclude by proposing potential research trajectories to further enhance the development and performance of Large Language Models in their current stage.