CYSep 25, 2023
ChatGPT Performance on Standardized Testing Exam -- A Proposed Strategy for LearnersUmer Farooq, Saira Anwar
This study explores the problem solving capabilities of ChatGPT and its prospective applications in standardized test preparation, focusing on the GRE quantitative exam. Prior research has shown great potential for the utilization of ChatGPT for academic purposes in revolutionizing the approach to studying across various disciplines. We investigate how ChatGPT performs across various question types in the GRE quantitative domain, and how modifying question prompts impacts its accuracy. More specifically this study addressed two research questions: 1. How does ChatGPT perform in answering GRE-based quantitative questions across various content areas? 2. How does the accuracy of ChatGPT vary with modifying the question prompts? The dataset consisting of 100 randomly selected GRE quantitative questions was collected from the ETS official guide to GRE test preparation. We used quantitative evaluation to answer our first research question, and t-test to examine the statistical association between prompt modification and ChatGPT's accuracy. Results show a statistical improvement in the ChatGPT's accuracy after applying instruction priming and contextual prompts to the original questions. ChatGPT showed 84% accuracy with the modified prompts compared to 69% with the original data. The study discusses the areas where ChatGPT struggled with certain questions and how modifications can be helpful for preparing for standardized tests like GRE and provides future directions for prompt modifications.
CLJan 26, 2025
Visualizing Uncertainty in Translation Tasks: An Evaluation of LLM Performance and Confidence MetricsJin Hyun Park, Utsawb Laminchhane, Umer Farooq et al.
Large language models (LLMs) are increasingly utilized for machine translation, yet their predictions often exhibit uncertainties that hinder interpretability and user trust. Effectively visualizing these uncertainties can enhance the usability of LLM outputs, particularly in contexts where translation accuracy is critical. This paper addresses two primary objectives: (1) providing users with token-level insights into model confidence and (2) developing a web-based visualization tool to quantify and represent translation uncertainties. To achieve these goals, we utilized the T5 model with the WMT19 dataset for translation tasks and evaluated translation quality using established metrics such as BLEU, METEOR, and ROUGE. We introduced three novel uncertainty quantification (UQ) metrics: (1) the geometric mean of token probabilities, (2) the arithmetic mean of token probabilities, and (3) the arithmetic mean of the kurtosis of token distributions. These metrics provide a simple yet effective framework for evaluating translation performance. Our analysis revealed a linear relationship between the traditional evaluation metrics and our UQ metrics, demonstrating the validity of our approach. Additionally, we developed an interactive web-based visualization that uses a color gradient to represent token confidence. This tool offers users a clear and intuitive understanding of translation quality while providing valuable insights into model performance. Overall, we show that our UQ metrics and visualization are both robust and interpretable, offering practical tools for evaluating and accessing machine translation systems.