Bits of Grass: Does GPT already know how to write like Whitman?
This addresses the problem of evaluating AI's creative writing capabilities for researchers and practitioners, but it is incremental as it tests existing models on a new task.
The study tested GPT-3.5, GPT-3.5-turbo, and GPT-4 models on generating poems in specific authors' styles using zero-shot and many-shot prompts up to 8192 tokens, finding that even with 17 poem examples, the models failed to produce poetry in the desired style without fine-tuning.
This study examines the ability of GPT-3.5, GPT-3.5-turbo (ChatGPT) and GPT-4 models to generate poems in the style of specific authors using zero-shot and many-shot prompts (which use the maximum context length of 8192 tokens). We assess the performance of models that are not fine-tuned for generating poetry in the style of specific authors, via automated evaluation. Our findings indicate that without fine-tuning, even when provided with the maximum number of 17 poem examples (8192 tokens) in the prompt, these models do not generate poetry in the desired style.