Enhancing Systematic Reviews with Large Language Models: Using GPT-4 and Kimi
This work addresses the efficiency of systematic reviews for researchers, but it is incremental as it applies existing LLMs to a specific domain.
The study investigated GPT-4 and Kimi for systematic reviews by comparing their generated codes to human codes, finding that LLM performance varies with data volume and question complexity.
This research delved into GPT-4 and Kimi, two Large Language Models (LLMs), for systematic reviews. We evaluated their performance by comparing LLM-generated codes with human-generated codes from a peer-reviewed systematic review on assessment. Our findings suggested that the performance of LLMs fluctuates by data volume and question complexity for systematic reviews.