CLAIMay 15, 2023

Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apology

arXiv:2305.08339v556 citations
Originality Incremental advance
AI Analysis

This addresses the scalability and efficiency problem for corpus linguists and researchers in pragmatics and discourse analysis, though it is incremental as it builds on existing LLM capabilities for a specific annotation task.

The study tackled the challenge of automating complex pragmatic and discourse annotation in corpus linguistics, which is typically manual and time-consuming, by testing large language models (LLMs) like GPT-3.5 and GPT-4 for annotating apology components; they found that GPT-4 outperformed GPT-3.5 and approached human coder accuracy.

Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches in corpus linguistics. To address this, our study explores the possibility of using large language models (LLMs) to automate pragma-discursive corpus annotation. We compare GPT-3.5 (the model behind the free-to-use version of ChatGPT), GPT-4 (the model underpinning the precise mode of Bing chatbot), and a human coder in annotating apology components in English based on the local grammar framework. We find that GPT-4 outperformed GPT-3.5, with accuracy approaching that of a human coder. These results suggest that LLMs can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient, scalable and accessible.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes