Calibration of Google Trends Time Series
This addresses a data quality issue for researchers using Google Trends, but it is an incremental improvement focused on calibration rather than a fundamental change.
The paper tackles the problem of rounding errors in Google Trends data, which can produce uninformative time series, by proposing Google Trends Anchor Bank (G-TAB) to calibrate query popularity on a common scale without such errors, demonstrating efficiency with few search steps in evaluation.
Google Trends is a tool that allows researchers to analyze the popularity of Google search queries across time and space. In a single request, users can obtain time series for up to 5 queries on a common scale, normalized to the range from 0 to 100 and rounded to integer precision. Despite the overall value of Google Trends, rounding causes major problems, to the extent that entirely uninformative, all-zero time series may be returned for unpopular queries when requested together with more popular queries. We address this issue by proposing Google Trends Anchor Bank (G-TAB), an efficient solution for the calibration of Google Trends data. Our method expresses the popularity of an arbitrary number of queries on a common scale without being compromised by rounding errors. The method proceeds in two phases. In the offline preprocessing phase, an "anchor bank" is constructed, a set of queries spanning the full spectrum of popularity, all calibrated against a common reference query by carefully chaining together multiple Google Trends requests. In the online deployment phase, any given search query is calibrated by performing an efficient binary search in the anchor bank. Each search step requires one Google Trends request, but few steps suffice, as we demonstrate in an empirical evaluation. We make our code publicly available as an easy-to-use library at https://github.com/epfl-dlab/GoogleTrendsAnchorBank.