Assisting Research Proposal Writing with Large Language Models: Evaluation and Refinement
This addresses ethical concerns and improves reliability in academic writing assistance using LLMs, though it is incremental as it builds on existing evaluation methods.
The study tackled the problem of evaluating and improving research proposal writing by large language models, proposing quantitative metrics for content quality and reference validity and an iterative prompting method, which significantly enhanced content quality and reduced reference inaccuracies in experiments.
Large language models (LLMs) like ChatGPT are increasingly used in academic writing, yet issues such as incorrect or fabricated references raise ethical concerns. Moreover, current content quality evaluations often rely on subjective human judgment, which is labor-intensive and lacks objectivity, potentially compromising the consistency and reliability. In this study, to provide a quantitative evaluation and enhance research proposal writing capabilities of LLMs, we propose two key evaluation metrics--content quality and reference validity--and an iterative prompting method based on the scores derived from these two metrics. Our extensive experiments show that the proposed metrics provide an objective, quantitative framework for assessing ChatGPT's writing performance. Additionally, iterative prompting significantly enhances content quality while reducing reference inaccuracies and fabrications, addressing critical ethical challenges in academic contexts.