IROct 26, 2016

Inferring individual attributes from search engine queries and auxiliary information

arXiv:1610.08442v126 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of cohort identification for researchers in fields like medicine, enabling studies on sensitive topics from anonymized data, though it is incremental as it builds on existing methods for trait inference.

The paper tackles the problem of identifying users with specific traits from anonymized search engine data by introducing an algorithm that uses a small labeled set and population statistics to label unseen examples, validated on political data and applied to medical domains like cancer detection and disease spread prediction.

Internet data has surfaced as a primary source for investigation of different aspects of human behavior. A crucial step in such studies is finding a suitable cohort (i.e., a set of users) that shares a common trait of interest to researchers. However, direct identification of users sharing this trait is often impossible, as the data available to researchers is usually anonymized to preserve user privacy. To facilitate research on specific topics of interest, especially in medicine, we introduce an algorithm for identifying a trait of interest in anonymous users. We illustrate how a small set of labeled examples, together with statistical information about the entire population, can be aggregated to obtain labels on unseen examples. We validate our approach using labeled data from the political domain. We provide two applications of the proposed algorithm to the medical domain. In the first, we demonstrate how to identify users whose search patterns indicate they might be suffering from certain types of cancer. In the second, we detail an algorithm to predict the distribution of diseases given their incidence in a subset of the population at study, making it possible to predict disease spread from partial epidemiological data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes