Can AI Write Classical Chinese Poetry like Humans? An Empirical Study Inspired by Turing Test
This addresses the question of AI's creative capabilities in poetry writing for researchers and AI developers, showing significant progress but being incremental in applying existing methods to a specific domain.
The paper tackled the problem of whether AI can write classical Chinese poetry as well as humans by proposing ProFTAP, a novel evaluation framework inspired by the Turing test, and found that recent large language models (LLMs) can produce poems nearly indistinguishable from human ones, with some open-source LLMs outperforming GPT-4.
Some argue that the essence of humanity, such as creativity and sentiment, can never be mimicked by machines. This paper casts doubt on this belief by studying a vital question: Can AI compose poetry as well as humans? To answer the question, we propose ProFTAP, a novel evaluation framework inspired by Turing test to assess AI's poetry writing capability. We apply it on current large language models (LLMs) and find that recent LLMs do indeed possess the ability to write classical Chinese poems nearly indistinguishable from those of humans. We also reveal that various open-source LLMs can outperform GPT-4 on this task.