CRApr 8
Understanding Data Collection, Brokerage, and Spam in the Lead Marketing EcosystemYash Vekaria, Nurullah Demir, Konrad Kollnig et al.
The lead marketing ecosystem enables collection, sale, and use of personal data submitted via web forms to deliver personalized quotes in high-value verticals such as insurance. Despite its scale and sensitivity of the collected data, this ecosystem remains largely unexplored by the research community. We present the first empirical study of privacy and spam risks in lead marketing, developing an end-to-end measurement framework to trace data flows from data collection to consumer contact. Our setup instruments over 100 health-related lead-generation websites and monitors 200 controlled phone numbers and email addresses to understand downstream marketing practices. We observe sharing of highly personal and sensitive health information to more than 70 distinct third parties on these lead generation websites. By purchasing our own and other organic leads from three major lead platforms, we uncover deceptive brokerage practices, where consumer data is sold to unvetted buyers and often augmented or fabricated with attributes such as health status and weight. We received a total of over 8,000 telemarketing phone calls, 600 text messages, and 200 emails, where calls often began within seconds of form submission. Many campaigns relied on VoIP-based neighbor spoofing and high-frequency dialing, at times rendering phones unusable. Our experiments with phone and email opt-outs suggest phone-based opt-outs to help the most, although all were ineffective at completely stopping marketing communications. Analysis of 7,432 Better Business Bureau (BBB) complaints and reviews corroborates these findings from the consumer perspective. Overall, our results reveal a highly interconnected and non-compliant lead marketing ecosystem that aggressively monetizes sensitive consumer data.
CRMar 16
Keys on Doormats: Exposed API Credentials on the WebNurullah Demir, Yash Vekaria, Georgios Smaragdakis et al.
Application programming interfaces (APIs) have become a central part of the modern IT environment, allowing developers to enrich the functionality of applications and interact with third parties such as cloud and payment providers. This interaction often occurs through authentication mechanisms that rely on sensitive credentials such as API keys and tokens that require secure handling. Exposure of these credentials can pose significant consequences to organizations, as malicious attackers can gain access to related services. Previous studies have shown exposure of these sensitive credentials in different environments such as cloud platforms and GitHub. However, the web remains unexplored. In this paper, we study exposure of credentials on the web by analyzing 10M webpages. Our findings reveal that API credentials are widely and publicly exposed on the web, including highly popular and critical webpages such as those of global banks and firmware developers. We identify 1,748 distinct credentials from 14 service providers (e.g., cloud and payment providers) across nearly 10,000 webpages. Moreover, our analysis of archived data suggest credentials to remain exposed for periods ranging from a month to several years. We characterize web-specific exposure vectors and root causes, finding that most originate from JavaScript environments. We also discuss the outcomes of our responsible disclosure efforts that demonstrated a substantial reduction in credential exposure on the web.
CRFeb 3, 2022
Towards Understanding First-Party Cookie Tracking in the FieldNurullah Demir, Daniel Theis, Tobias Urban et al.
Third-party web tracking is a common, and broadly used technique on the Web. Almost every step of users' is tracked, analyzed, and later used in different use cases (e.g., online advertisement). Different defense mechanisms have emerged to counter these practices (e.g., the recent step of browser vendors to ban all third-party cookies). However, all of these countermeasures only target third-party trackers, and ignore the first party because the narrative is that such monitoring is mostly used to improve the utilized service (e.g., analytical services). In this paper, we present a large-scale measurement study that analyzes tracking performed by the first party but utilized by a third party to circumvent standard tracking preventing techniques (i.e., the first party performs the tracking in the name of the third party). We visit the top 15,000 websites to analyze first-party cookies used to track users and a technique called "DNS CNAME cloaking", which can be used by a third party to place first-party cookies. Using this data, we show that 76% sites in our dataset effectively utilize such tracking techniques, and in a long-running analysis, we show that the usage of such cookies increased by more than 50% over 2021. Furthermore, we shed light on the ecosystem utilizing first-party trackers, and find that the established trackers already use such tracking, presumably to avoid tracking blocking.