Bob de Ruiter

CL
3papers
22citations
Novelty20%
AI Score17

3 Papers

DLNov 25, 2023
Automatically Finding and Categorizing Replication Studies

Bob de Ruiter

In many fields of experimental science, papers that failed to replicate continue to be cited as a result of the poor discoverability of replication studies. As a first step to creating a system that automatically finds replication studies for a given paper, 334 replication studies and 344 replicated studies were collected. Replication studies could be identified in the dataset based on text content at a higher rate than chance (AUROC = 0.886). Additionally, successful replication studies could be distinguished from failed replication studies at a higher rate than chance (AUROC = 0.664).

LGMay 14, 2021
Post-processing Multi-Model Medium-Term Precipitation Forecasts Using Convolutional Neural Networks

Bob de Ruiter

The goal of this study was to improve the post-processing of precipitation forecasts using convolutional neural networks (CNNs). Instead of post-processing forecasts on a per-pixel basis, as is usually done when employing machine learning in meteorological post-processing, input forecast images were combined and transformed into probabilistic output forecast images using fully convolutional neural networks. CNNs did not outperform regularized logistic regression. Additionally, an ablation analysis was performed. Combining input forecasts from a global low-resolution weather model and a regional high-resolution weather model improved performance over either one.

CLNov 19, 2018
The Mafiascum Dataset: A Large Text Corpus for Deception Detection

Bob de Ruiter, George Kachergis

Detecting deception in natural language has a wide variety of applications, but because of its hidden nature there are currently no public, large-scale sources of labeled deceptive text. This work introduces the Mafiascum dataset [1], a collection of over 700 games of Mafia, in which players are randomly assigned either deceptive or non-deceptive roles and then interact via forum postings. Over 9000 documents were compiled from the dataset, which each contained all messages written by a single player in a single game. This corpus was used to construct a set of hand-picked linguistic features based on prior deception research, as well as a set of average word vectors enriched with subword information. A logistic regression classifier fit on a combination of these feature sets achieved an average precision of 0.39 (chance = 0.26) and an AUROC of 0.68 on 5000+ word documents. On 50+ word documents, an average precision of 0.29 (chance = 0.23) and an AUROC of 0.59 was achieved. [1] https://bitbucket.org/bopjesvla/thesis/src