CLDec 13, 2024

TACOMORE: Leveraging the Potential of LLMs in Corpus-based Discourse Analysis with Prompt Engineering

arXiv:2412.10139v13 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the challenge for corpus linguists in automating qualitative analysis with LLMs, though it is incremental as it builds on existing prompting methods.

The paper tackles the problem of low performance, hallucination, and irreproducibility in using LLMs for corpus-based discourse analysis by proposing TACOMORE, a prompting framework that improves accuracy, ethicality, reasoning, and reproducibility in tasks like keyword, collocate, and concordance analysis on a COVID-19 corpus.

The capacity of LLMs to carry out automated qualitative analysis has been questioned by corpus linguists, and it has been argued that corpus-based discourse analysis incorporating LLMs is hindered by issues of unsatisfying performance, hallucination, and irreproducibility. Our proposed method, TACOMORE, aims to address these concerns by serving as an effective prompting framework in this domain. The framework consists of four principles, i.e., Task, Context, Model and Reproducibility, and specifies five fundamental elements of a good prompt, i.e., Role Description, Task Definition, Task Procedures, Contextual Information and Output Format. We conduct experiments on three LLMs, i.e., GPT-4o, Gemini-1.5-Pro and Gemini-1.5.Flash, and find that TACOMORE helps improve LLM performance in three representative discourse analysis tasks, i.e., the analysis of keywords, collocates and concordances, based on an open corpus of COVID-19 research articles. Our findings show the efficacy of the proposed prompting framework TACOMORE in corpus-based discourse analysis in terms of Accuracy, Ethicality, Reasoning, and Reproducibility, and provide novel insights into the application and evaluation of LLMs in automated qualitative studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes