Measuring Sentence-Level and Aspect-Level (Un)certainty in Science Communications
This work addresses the need for better certainty modeling in science communication, though it is incremental as it builds on existing language models and annotation methods.
The authors tackled the problem of modeling certainty in science communication by introducing a new dataset of 2167 annotated scientific findings and showing that hedges alone are insufficient, achieving prediction of overall certainty and aspects with pre-trained language models.
Certainty and uncertainty are fundamental to science communication. Hedges have widely been used as proxies for uncertainty. However, certainty is a complex construct, with authors expressing not only the degree but the type and aspects of uncertainty in order to give the reader a certain impression of what is known. Here, we introduce a new study of certainty that models both the level and the aspects of certainty in scientific findings. Using a new dataset of 2167 annotated scientific findings, we demonstrate that hedges alone account for only a partial explanation of certainty. We show that both the overall certainty and individual aspects can be predicted with pre-trained language models, providing a more complete picture of the author's intended communication. Downstream analyses on 431K scientific findings from news and scientific abstracts demonstrate that modeling sentence-level and aspect-level certainty is meaningful for areas like science communication. Both the model and datasets used in this paper are released at https://blablablab.si.umich.edu/projects/certainty/.