CLMar 25, 2023
Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error CorrectionSteven Coyne, Keisuke Sakaguchi, Diana Galvan-Sosa et al.
GPT-3 and GPT-4 models are powerful, achieving high performance on a variety of Natural Language Processing tasks. However, there is a relative lack of detailed published analysis of their performance on the task of grammatical error correction (GEC). To address this, we perform experiments testing the capabilities of a GPT-3.5 model (text-davinci-003) and a GPT-4 model (gpt-4-0314) on major GEC benchmarks. We compare the performance of different prompts in both zero-shot and few-shot settings, analyzing intriguing or problematic outputs encountered with different prompt formats. We report the performance of our best prompt on the BEA-2019 and JFLEG datasets, finding that the GPT models can perform well in a sentence-level revision setting, with GPT-4 achieving a new high score on the JFLEG benchmark. Through human evaluation experiments, we compare the GPT models' corrections to source, human reference, and baseline GEC system sentences and observe differences in editing strategies and how they are scored by human raters.
CLAug 9, 2025
Annotating Errors in English Learners' Written Language Production: Advancing Automated Written Feedback SystemsSteven Coyne, Diana Galvan-Sosa, Ryan Spring et al.
Recent advances in natural language processing (NLP) have contributed to the development of automated writing evaluation (AWE) systems that can correct grammatical errors. However, while these systems are effective at improving text, they are not optimally designed for language learning. They favor direct revisions, often with a click-to-fix functionality that can be applied without considering the reason for the correction. Meanwhile, depending on the error type, learners may benefit most from simple explanations and strategically indirect hints, especially on generalizable grammatical rules. To support the generation of such feedback, we introduce an annotation framework that models each error's error type and generalizability. For error type classification, we introduce a typology focused on inferring learners' knowledge gaps by connecting their errors to specific grammatical patterns. Following this framework, we collect a dataset of annotated learner errors and corresponding human-written feedback comments, each labeled as a direct correction or hint. With this data, we evaluate keyword-guided, keyword-free, and template-guided methods of generating feedback using large language models (LLMs). Human teachers examined each system's outputs, assessing them on grounds including relevance, factuality, and comprehensibility. We report on the development of the dataset and the comparative performance of the systems investigated.
CLJan 20, 2012
Du TAL au TILMichael Zock, Guy Lapalme
Historically two types of NLP have been investigated: fully automated processing of language by machines (NLP) and autonomous processing of natural language by people, i.e. the human brain (psycholinguistics). We believe that there is room and need for another kind, INLP: interactive natural language processing. This intermediate approach starts from peoples' needs, trying to bridge the gap between their actual knowledge and a given goal. Given the fact that peoples' knowledge is variable and often incomplete, the aim is to build bridges linking a given knowledge state to a given goal. We present some examples, trying to show that this goal is worth pursuing, achievable and at a reasonable cost.
CLJan 20, 2012
Système d'aide à l'accès lexical : trouver le mot qu'on a sur le bout de la langueGaëlle Lortal, Brigitte Grau, Michael Zock
The study of the Tip of the Tongue phenomenon (TOT) provides valuable clues and insights concerning the organisation of the mental lexicon (meaning, number of syllables, relation with other words, etc.). This paper describes a tool based on psycho-linguistic observations concerning the TOT phenomenon. We've built it to enable a speaker/writer to find the word he is looking for, word he may know, but which he is unable to access in time. We try to simulate the TOT phenomenon by creating a situation where the system knows the target word, yet is unable to access it. In order to find the target word we make use of the paradigmatic and syntagmatic associations stored in the linguistic databases. Our experiment allows the following conclusion: a tool like SVETLAN, capable to structure (automatically) a dictionary by domains can be used sucessfully to help the speaker/writer to find the word he is looking for, if it is combined with a database rich in terms of paradigmatic links like EuroWordNet.
HCJan 20, 2012
Évaluation et consolidation d'un réseau lexical via un outil pour retrouver le mot sur le bout de la langueAlain Joubert, Mathieu Lafourcade, Didier Schwab et al.
Since September 2007, a large scale lexical network for French is under construction through methods based on some kind of popular consensus by means of games (JeuxDeMots project). Human intervention can be considered as marginal. It is limited to corrections, adjustments and validation of the senses of terms, which amounts to less than 0,5 % of the relations in the network. To appreciate the quality of this resource built by non-expert users (players of the game), we use a similar approach to its construction. The resource must be validated by laymen, persistent in time, on open class vocabulary. We suggest to check whether our tool is able to solve the Tip of the Tongue (TOT) problem. Just like JeuxDeMots, our tool can be considered as an on-line game. Like the former, it allows the acquisition of new relations, enriching thus the (existing) network.