IRMay 24, 2025
Improving Ad matching via Cluster-Adaptive Keyword Expansion and Relevance tuningDipanwita Saha, Anis Zaman, Hua Zou et al.
In search advertising, keyword matching connects user queries with relevant ads. While token-based matching increases ad coverage, it can reduce relevance due to overly permissive semantic expansion. This work extends keyword reach through document-side semantic keyword expansion, using a language model to broaden token-level matching without altering queries. We propose a solution using a pre-trained siamese model to generate dense vector representations of ad keywords and identify semantically related variants through nearest neighbor search. To maintain precision, we introduce a cluster-based thresholding mechanism that adjusts similarity cutoffs based on local semantic density. Each expanded keyword maps to a group of seller-listed items, which may only partially align with the original intent. To ensure relevance, we enhance the downstream relevance model by adapting it to the expanded keyword space using an incremental learning strategy with a lightweight decision tree ensemble. This system improves both relevance and click-through rate (CTR), offering a scalable, low-latency solution adaptable to evolving query behavior and advertising inventory.
CYOct 28, 2020
Detecting Individuals with Depressive Disorder fromPersonal Google Search and YouTube History LogsBoyu Zhang, Anis Zaman, Rupam Acharyya et al.
Depressive disorder is one of the most prevalent mental illnesses among the global population. However, traditional screening methods require exacting in-person interviews and may fail to provide immediate interventions. In this work, we leverage ubiquitous personal longitudinal Google Search and YouTube engagement logs to detect individuals with depressive disorder. We collected Google Search and YouTube history data and clinical depression evaluation results from $212$ participants ($99$ of them suffered from moderate to severe depressions). We then propose a personalized framework for classifying individuals with and without depression symptoms based on mutual-exciting point process that captures both the temporal and semantic aspects of online activities. Our best model achieved an average F1 score of $0.77 \pm 0.04$ and an AUC ROC of $0.81 \pm 0.02$.
CYSep 5, 2020
The Relationship between Deteriorating Mental Health Conditions and Longitudinal Behavioral Changes in Google and YouTube Usages among College Students in the United States during COVID-19: Observational StudyAnis Zaman, Boyu Zhang, Ehsan Hoque et al.
Mental health problems among the global population are worsened during the coronavirus disease (COVID-19). How individuals engage with online platforms such as Google Search and YouTube undergoes drastic shifts due to pandemic and subsequent lockdowns. Such ubiquitous daily behaviors on online platforms have the potential to capture and correlate with clinically alarming deteriorations in mental health profiles in a non-invasive manner. The goal of this study is to examine, among college students, the relationship between deteriorating mental health conditions and changes in user behaviors when engaging with Google Search and YouTube during COVID-19. This study recruited a cohort of 49 students from a U.S. college campus during January 2020 (prior to the pandemic) and measured the anxiety and depression levels of each participant. This study followed up with the same cohort during May 2020 (during the pandemic), and the anxiety and depression levels were assessed again. The longitudinal Google Search and YouTube history data were anonymized and collected. From individual-level Google Search and YouTube histories, we developed 5 signals that can quantify shifts in online behaviors during the pandemic. We then assessed the differences between groups with and without deteriorating mental health profiles in terms of these features. Significant features included late-night online activities, continuous usages, and time away from the internet, porn consumptions, and keywords associated with negative emotions, social activities, and personal affairs. Though further studies are required, our results demonstrated the feasibility of utilizing pervasive online data to establish non-invasive surveillance systems for mental health conditions that bypasses many disadvantages of existing screening methods.
HCJul 1, 2020
Individual-level Anxiety Detection and Prediction from Longitudinal YouTube and Google Search Engagement LogsAnis Zaman, Boyu Zhang, Vincent Silenzio et al.
Anxiety disorder is one of the world's most prevalent mental health conditions, arising from complex interactions of biological and environmental factors and severely interfering one's ability to lead normal life activities. Current methods for detecting anxiety heavily rely on in-person interviews, which can be expensive, time-consuming, and blocked by social stigmas. In this work, we propose an alternative method to identify individuals with anxiety and further estimate their levels of anxiety using personal online activity histories from YouTube and the Google Search engine, platforms that are used by millions of people daily. We ran a longitudinal study and collected multiple rounds of anonymized YouTube and Google Search logs from volunteering participants, along with their clinically validated ground-truth anxiety assessment scores. We then developed explainable features that capture both the temporal and contextual aspects of online behaviors. Using those, we were able to train models that (i) identify individuals having anxiety disorder with an average F1 score of 0.83 and (ii) assess the level of anxiety by predicting the gold standard Generalized Anxiety Disorder 7-item scores (ranges from 0 to 21) with a mean square error of 1.87 based on the ubiquitous individual-level online engagement data. Our proposed anxiety assessment framework is cost-effective, time-saving, scalable, and opens the door for it to be deployed in real-world clinical settings, empowering care providers and therapists to learn about anxiety disorders of patients non-invasively at any moment in time.