CLOct 5, 2025

Evaluation of Clinical Trials Reporting Quality using Large Language Models

arXiv:2510.04338v1h-index: 20
Originality Synthesis-oriented
AI Analysis

This addresses reporting quality issues in clinical trials, which can impact clinical decisions, but is incremental as it applies existing methods to a new domain.

The study tested large language models for evaluating clinical trial reporting quality using CONSORT standards, achieving 85% accuracy with the best model and prompting method.

Reporting quality is an important topic in clinical trial research articles, as it can impact clinical decisions. In this article, we test the ability of large language models to assess the reporting quality of this type of article using the Consolidated Standards of Reporting Trials (CONSORT). We create CONSORT-QA, an evaluation corpus from two studies on abstract reporting quality with CONSORT-abstract standards. We then evaluate the ability of different large generative language models (from the general domain or adapted to the biomedical domain) to correctly assess CONSORT criteria with different known prompting methods, including Chain-of-thought. Our best combination of model and prompting method achieves 85% accuracy. Using Chain-of-thought adds valuable information on the model's reasoning for completing the task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes