A negation detection assessment of GPTs: analysis with the xNot360 dataset
This work addresses the problem of logical reliability in high-stakes domains like healthcare, science, and law, but it is incremental as it applies an existing method to new data.
The study assessed the negation detection performance of GPT models (GPT-2, GPT-3, GPT-3.5, GPT-4) using a zero-shot approach on the xNot360 dataset, finding that GPT-4 performed best but overall proficiency was modest, indicating limitations in natural language understanding.
Negation is a fundamental aspect of natural language, playing a critical role in communication and comprehension. Our study assesses the negation detection performance of Generative Pre-trained Transformer (GPT) models, specifically GPT-2, GPT-3, GPT-3.5, and GPT-4. We focus on the identification of negation in natural language using a zero-shot prediction approach applied to our custom xNot360 dataset. Our approach examines sentence pairs labeled to indicate whether the second sentence negates the first. Our findings expose a considerable performance disparity among the GPT models, with GPT-4 surpassing its counterparts and GPT-3.5 displaying a marked performance reduction. The overall proficiency of the GPT models in negation detection remains relatively modest, indicating that this task pushes the boundaries of their natural language understanding capabilities. We not only highlight the constraints of GPT models in handling negation but also emphasize the importance of logical reliability in high-stakes domains such as healthcare, science, and law.