LLM for Comparative Narrative Analysis
This work addresses the need for equitable comparisons of LLM performance in narrative analysis tasks, though it is incremental as it applies existing methods to new models.
The paper tackled the problem of comparing narrative analysis capabilities across three large language models (GPT-3.5, PaLM2, and Llama2) by applying identical prompts and evaluating outputs, revealing notable discrepancies in their responses as assessed by human evaluation.
In this paper, we conducted a Multi-Perspective Comparative Narrative Analysis (CNA) on three prominent LLMs: GPT-3.5, PaLM2, and Llama2. We applied identical prompts and evaluated their outputs on specific tasks, ensuring an equitable and unbiased comparison between various LLMs. Our study revealed that the three LLMs generated divergent responses to the same prompt, indicating notable discrepancies in their ability to comprehend and analyze the given task. Human evaluation was used as the gold standard, evaluating four perspectives to analyze differences in LLM performance.