Language-based Examples in the Statistics Classroom
This work addresses the need for engaging and accessible teaching methods in statistics education, though it is incremental in applying existing statistical tools to new types of examples.
The paper tackles the challenge of diversifying statistics pedagogy by introducing language-based examples, such as wordplay patterns, idiomatic word pairs, and pangram analysis, to illustrate statistical concepts using real-world text data.
Statistics pedagogy values using a variety of examples. Thanks to text resources on the Web, and since statistical packages have the ability to analyze string data, it is now easy to use language-based examples in a statistics class. Three such examples are discussed here. First, many types of wordplay (e.g., crosswords and hangman) involve finding words with letters that satisfy a certain pattern. Second, linguistics has shown that idiomatic pairs of words often appear together more frequently than chance. For example, in the Brown Corpus, this is true of the phrasal verb to throw up (p-value=7.92E-10.) Third, a pangram contains all the letters of the alphabet at least once. These are searched for in Charles Dickens' A Christmas Carol, and their lengths are compared to the expected value given by the unequal probability coupon collector's problem as well as simulations.