Efficient Calculation of Bigram Frequencies in a Corpus of Short Texts
This addresses a computational bottleneck for researchers or practitioners working with short-text data, but it is incremental as it modifies an existing method for a specific scenario.
The authors tackled the problem of calculating bigram frequencies in corpora of short texts, showing that an existing efficient method is unsuitable and proposing a simple alternative with the same computational complexity that provides exact counts instead of approximations.
We show that an efficient and popular method for calculating bigram frequencies is unsuitable for bodies of short texts and offer a simple alternative. Our method has the same computational complexity as the old method and offers an exact count instead of an approximation.