Joost de Winter

2papers

2 Papers

CYSep 19, 2024

System 2 thinking in OpenAI's o1-preview model: Near-perfect performance on a mathematics exam

Joost de Winter, Dimitra Dodou, Yke Bauke Eisma

The processes underlying human cognition are often divided into System 1, which involves fast, intuitive thinking, and System 2, which involves slow, deliberate reasoning. Previously, large language models were criticized for lacking the deeper, more analytical capabilities of System 2. In September 2024, OpenAI introduced the o1 model series, designed to handle System 2-like reasoning. While OpenAI's benchmarks are promising, independent validation is still needed. In this study, we tested the o1-preview model twice on the Dutch 'Mathematics B' final exam. It scored a near-perfect 76 and 74 out of 76 points. For context, only 24 out of 16,414 students in the Netherlands achieved a perfect score. By comparison, the GPT-4o model scored 66 and 62 out of 76, well above the Dutch students' average of 40.63 points. Neither model had access to the exam figures. Since there was a risk of model contami-nation (i.e., the knowledge cutoff for o1-preview and GPT-4o was after the exam was published online), we repeated the procedure with a new Mathematics B exam that was published after the cutoff date. The results again indicated that o1-preview performed strongly (97.8th percentile), which suggests that contamination was not a factor. We also show that there is some variability in the output of o1-preview, which means that sometimes there is 'luck' (the answer is correct) or 'bad luck' (the output has diverged into something that is incorrect). We demonstrate that the self-consistency approach, where repeated prompts are given and the most common answer is selected, is a useful strategy for identifying the correct answer. It is concluded that while OpenAI's new model series holds great potential, certain risks must be considered.

HCMar 27, 2021Code

Using Eye-tracking Data to Predict Situation Awareness in Real Time during Takeover Transitions in Conditionally Automated Driving

Feng Zhou, X. Jessie Yang, Joost de Winter

Situation awareness (SA) is critical to improving takeover performance during the transition period from automated driving to manual driving. Although many studies measured SA during or after the driving task, few studies have attempted to predict SA in real time in automated driving. In this work, we propose to predict SA during the takeover transition period in conditionally automated driving using eye-tracking and self-reported data. First, a tree ensemble machine learning model, named LightGBM (Light Gradient Boosting Machine), was used to predict SA. Second, in order to understand what factors influenced SA and how, SHAP (SHapley Additive exPlanations) values of individual predictor variables in the LightGBM model were calculated. These SHAP values explained the prediction model by identifying the most important factors and their effects on SA, which further improved the model performance of LightGBM through feature selection. We standardized SA between 0 and 1 by aggregating three performance measures (i.e., placement, distance, and speed estimation of vehicles with regard to the ego-vehicle) of SA in recreating simulated driving scenarios, after 33 participants viewed 32 videos with six lengths between 1 and 20 s. Using only eye-tracking data, our proposed model outperformed other selected machine learning models, having a root-mean-squared error (RMSE) of 0.121, a mean absolute error (MAE) of 0.096, and a 0.719 correlation coefficient between the predicted SA and the ground truth. The code is available at https://github.com/refengchou/Situation-awareness-prediction. Our proposed model provided important implications on how to monitor and predict SA in real time in automated driving using eye-tracking data.