Shailesh Bavadekar

CR
h-index29
5papers
787citations
Novelty20%
AI Score24

5 Papers

IRDec 16, 2022
Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification

Jai Gupta, Yi Tay, Chaitanya Kamath et al.

With the devastating outbreak of COVID-19, vaccines are one of the crucial lines of defense against mass infection in this global pandemic. Given the protection they provide, vaccines are becoming mandatory in certain social and professional settings. This paper presents a classification model for detecting COVID-19 vaccination related search queries, a machine learning model that is used to generate search insights for COVID-19 vaccinations. The proposed method combines and leverages advancements from modern state-of-the-art (SOTA) natural language understanding (NLU) techniques such as pretrained Transformers with traditional dense features. We propose a novel approach of considering dense features as memory tokens that the model can attend to. We show that this new modeling approach enables a significant improvement to the Vaccine Search Insights (VSI) task, improving a strong well-established gradient-boosting baseline by relative +15% improvement in F1 score and +14% in precision.

LGOct 30, 2024
Community search signatures as foundation features for human-centered geospatial modeling

Mimi Sun, Chaitanya Kamath, Mohit Agarwal et al.

Aggregated relative search frequencies offer a unique composite signal reflecting people's habits, concerns, interests, intents, and general information needs, which are not found in other readily available datasets. Temporal search trends have been successfully used in time series modeling across a variety of domains such as infectious diseases, unemployment rates, and retail sales. However, most existing applications require curating specialized datasets of individual keywords, queries, or query clusters, and the search data need to be temporally aligned with the outcome variable of interest. We propose a novel approach for generating an aggregated and anonymized representation of search interest as foundation features at the community level for geospatial modeling. We benchmark these features using spatial datasets across multiple domains. In zip codes with a population greater than 3000 that cover over 95% of the contiguous US population, our models for predicting missing values in a 20% set of holdout counties achieve an average $R^2$ score of 0.74 across 21 health variables, and 0.80 across 6 demographic and environmental variables. Our results demonstrate that these search features can be used for spatial predictions without strict temporal alignment, and that the resulting models outperform spatial interpolation and state of the art methods using satellite imagery features.

CRJul 2, 2021
Google COVID-19 Vaccination Search Insights: Anonymization Process Description

Shailesh Bavadekar, Adam Boulanger, John Davis et al.

This report describes the aggregation and anonymization process applied to the COVID-19 Vaccination Search Insights (published at http://goo.gle/covid19vaccinationinsights), a publicly available dataset showing aggregated and anonymized trends in Google searches related to COVID-19 vaccination. The applied anonymization techniques protect every user's daily search activity related to COVID-19 vaccinations with $(\varepsilon, δ)$-differential privacy for $\varepsilon = 2.19$ and $δ= 10^{-5}$.

CRSep 2, 2020
Google COVID-19 Search Trends Symptoms Dataset: Anonymization Process Description (version 1.0)

Shailesh Bavadekar, Andrew Dai, John Davis et al.

This report describes the aggregation and anonymization process applied to the initial version of COVID-19 Search Trends symptoms dataset (published at https://goo.gle/covid19symptomdataset on September 2, 2020), a publicly available dataset that shows aggregated, anonymized trends in Google searches for symptoms (and some related topics). The anonymization process is designed to protect the daily symptom search activity of every user with $\varepsilon$-differential privacy for $\varepsilon$ = 1.68.

CRApr 8, 2020
Google COVID-19 Community Mobility Reports: Anonymization Process Description (version 1.1)

Ahmet Aktay, Shailesh Bavadekar, Gwen Cossoul et al.

This document describes the aggregation and anonymization process applied to the initial version of Google COVID-19 Community Mobility Reports (published at http://google.com/covid19/mobility on April 2, 2020), a publicly available resource intended to help public health authorities understand what has changed in response to work-from-home, shelter-in-place, and other recommended policies aimed at flattening the curve of the COVID-19 pandemic. Our anonymization process is designed to ensure that no personal data, including an individual's location, movement, or contacts, can be derived from the resulting metrics. The high-level description of the procedure is as follows: we first generate a set of anonymized metrics from the data of Google users who opted in to Location History. Then, we compute percentage changes of these metrics from a baseline based on the historical part of the anonymized metrics. We then discard a subset which does not meet our bar for statistical reliability, and release the rest publicly in a format that compares the result to the private baseline.