Principal Components of the Meaning
This work addresses the challenge of quantifying meaning in scientific literature for researchers in computational linguistics and information science, but it appears incremental as it applies existing methods to a specific domain.
The authors tackled the problem of representing lexical meaning in scientific texts by constructing a 13-dimensional Meaning Space using principal component analysis on word category data from the Web of Science, showing that this reduced set plausibly represents the entire corpus and hypothesizing about the qualitative interpretations of the components.
In this paper we argue that (lexical) meaning in science can be represented in a 13 dimension Meaning Space. This space is constructed using principal component analysis (singular decomposition) on the matrix of word category relative information gains, where the categories are those used by the Web of Science, and the words are taken from a reduced word set from texts in the Web of Science. We show that this reduced word set plausibly represents all texts in the corpus, so that the principal component analysis has some objective meaning with respect to the corpus. We argue that 13 dimensions is adequate to describe the meaning of scientific texts, and hypothesise about the qualitative meaning of the principal components.