CLJun 7, 2023

Long-form analogies generated by chatGPT lack human-like psycholinguistic properties

arXiv:2306.04537v121 citationsh-index: 25
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of evaluating LLM output quality for educational or linguistic applications, but it is incremental as it applies existing methods to a new dataset.

The study applied psycholinguistic methods to compare long-form analogies generated by human students and ChatGPT, finding high classification performance with 78 features, revealing several linguistic differences between the two sources.

Psycholinguistic analyses provide a means of evaluating large language model (LLM) output and making systematic comparisons to human-generated text. These methods can be used to characterize the psycholinguistic properties of LLM output and illustrate areas where LLMs fall short in comparison to human-generated text. In this work, we apply psycholinguistic methods to evaluate individual sentences from long-form analogies about biochemical concepts. We compare analogies generated by human subjects enrolled in introductory biochemistry courses to analogies generated by chatGPT. We perform a supervised classification analysis using 78 features extracted from Coh-metrix that analyze text cohesion, language, and readability (Graesser et. al., 2004). Results illustrate high performance for classifying student-generated and chatGPT-generated analogies. To evaluate which features contribute most to model performance, we use a hierarchical clustering approach. Results from this analysis illustrate several linguistic differences between the two sources.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes