AIApr 4, 2023

GPT-4 to GPT-3.5: 'Hold My Scalpel' -- A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam

arXiv:2304.01503v18 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This addresses the competency of AI models in specialized medical exams for plastic surgery residents, though it is incremental as it compares existing models on new data.

The paper evaluated GPT-4's performance on the Plastic Surgery In-Service Training Exam (PSITE), showing dramatic improvement over GPT-3.5, with scores increasing from the 8th to 88th percentile on the 2022 exam and from the 3rd to 99th percentile on the 2021 exam.

The Plastic Surgery In-Service Training Exam (PSITE) is an important indicator of resident proficiency and serves as a useful benchmark for evaluating OpenAI's GPT. Unlike many of the simulated tests or practice questions shown in the GPT-4 Technical Paper, the multiple-choice questions evaluated here are authentic PSITE questions. These questions offer realistic clinical vignettes that a plastic surgeon commonly encounters in practice and scores highly correlate with passing the written boards required to become a Board Certified Plastic Surgeon. Our evaluation shows dramatic improvement of GPT-4 (without vision) over GPT-3.5 with both the 2022 and 2021 exams respectively increasing the score from 8th to 88th percentile and 3rd to 99th percentile. The final results of the 2023 PSITE are set to be released on April 11, 2023, and this is an exciting moment to continue our research with a fresh exam. Our evaluation pipeline is ready for the moment that the exam is released so long as we have access via OpenAI to the GPT-4 API. With multimodal input, we may achieve superhuman performance on the 2023.

View on arXiv PDF

Similar