CL APApr 28, 2025

Enhancing Systematic Reviews with Large Language Models: Using GPT-4 and Kimi

Dandan Chen Kaptur, Yue Huang, Xuejun Ryan Ji, Yanhui Guo, Bradley Kaptur

arXiv:2504.20276v12.71 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the efficiency of systematic reviews for researchers, but it is incremental as it applies existing LLMs to a specific domain.

The study investigated GPT-4 and Kimi for systematic reviews by comparing their generated codes to human codes, finding that LLM performance varies with data volume and question complexity.

This research delved into GPT-4 and Kimi, two Large Language Models (LLMs), for systematic reviews. We evaluated their performance by comparing LLM-generated codes with human-generated codes from a peer-reviewed systematic review on assessment. Our findings suggested that the performance of LLMs fluctuates by data volume and question complexity for systematic reviews.

View on arXiv PDF

Similar