ChatGPT as a commenter to the news: can LLMs generate human-like opinions?
This work addresses the problem of evaluating LLMs' ability to mimic human opinions for applications in automated content generation, but it is incremental as it confirms limitations in a specific domain.
The study investigated whether GPT-3.5 can generate human-like comments on Dutch news articles, finding that fine-tuned BERT models easily distinguished human-written from GPT-generated comments, with no prompting method performing better and human comments showing higher lexical diversity.
ChatGPT, GPT-3.5, and other large language models (LLMs) have drawn significant attention since their release, and the abilities of these models have been investigated for a wide variety of tasks. In this research we investigate to what extent GPT-3.5 can generate human-like comments on Dutch news articles. We define human likeness as `not distinguishable from human comments', approximated by the difficulty of automatic classification between human and GPT comments. We analyze human likeness across multiple prompting techniques. In particular, we utilize zero-shot, few-shot and context prompts, for two generated personas. We found that our fine-tuned BERT models can easily distinguish human-written comments from GPT-3.5 generated comments, with none of the used prompting methods performing noticeably better. We further analyzed that human comments consistently showed higher lexical diversity than GPT-generated comments. This indicates that although generative LLMs can generate fluent text, their capability to create human-like opinionated comments is still limited.