CLMay 31, 2021

How Lexical Gold Standards Have Effects On The Usefulness Of Text Analysis Tools For Digital Scholarship

arXiv:2105.14921v10.53 citations

Originality Synthesis-oriented

AI Analysis

It addresses a misalignment in evaluation standards for digital humanities and social sciences, but is incremental as it critiques existing practices without proposing a new solution.

The paper argues that current lexical gold standards are biased towards topical relevance, which misaligns with digital scholarship needs in humanities and social sciences, and calls for more systematic formulation of requirements and explicit assumptions in model design.

This paper describes how the current lexical similarity and analogy gold standards are built to conform to certain ideas about what the models they are designed to evaluate are used for. Topical relevance has always been the most important target notion for information access tools and related language technology technologies, and while this has proven a useful starting point for much of what information technology is used for, it does not always align well with other uses to which technologies are being put, most notably use cases from digital scholarship in the humanities or social sciences. This paper argues for more systematic formulation of requirements from the digital humanities and social sciences and more explicit description of the assumptions underlying model design.

View on arXiv PDF

Similar