LGMay 26
Causal Risk Minimization for High-Dimensional TreatmentsNikita Dhawan, Arnav Paruthi, Andrew Kim et al.
Predicting the effect of interventions with many possible variations, e.g., therapeutic content that affects mental health outcomes or an earnings call transcript that drives movement in share price, is useful across several domains. However, classical causal estimators tend to assume that all possible interventions are observed, which is infeasible when interventions vary widely, for instance, in the space of all text strings. We adapt a well-known approach of recasting causal inference as a learning problem, to address high-dimensional treatment spaces. Specifically, under standard assumptions like no unobserved confounding, we show that causal error decomposes into a series of moment-balancing errors of increasing order, and design objectives that directly improve causal estimation. We also show how to project the effect of a high-dimensional treatment onto lower-dimensional treatment attributes, which allows a single model to answer several causal questions without additional attribute-specific training. We empirically evaluate our estimators in settings with high-dimensional continuous, discrete, and text treatments, the last of which used a semi-synthetic dataset of Amazon Reviews. Our experiments demonstrate the benefit of higher-order balance error optimization and competitive performance of projected causal estimates with attribute-specific estimators.
SIAug 7, 2020
Change-Point Analysis of Cyberbullying-Related Twitter Discussions During COVID-19Sanchari Das, Andrew Kim, Sayar Karmakar
Due to the outbreak of COVID-19, users are increasingly turning to online services. An increase in social media usage has also been observed, leading to the suspicion that this has also raised cyberbullying. In this initial work, we explore the possibility of an increase in cyberbullying incidents due to the pandemic and high social media usage. To evaluate this trend, we collected 454,046 cyberbullying-related public tweets posted between January 1st, 2020 -- June 7th, 2020. We summarize the tweets containing multiple keywords into their daily counts. Our analysis showed the existence of at most one statistically significant changepoint for most of these keywords, which were primarily located around the end of March. Almost all these changepoint time-locations can be attributed to COVID-19, which substantiates our initial hypothesis of an increase in cyberbullying through analysis of discussions over Twitter.
CRJun 29, 2020
Quantifying Susceptibility to Spear Phishing in a High School Environment Using Signal Detection TheoryPloy Unchit, Sanchari Das, Andrew Kim et al.
Spear phishing is a deceptive attack that uses social engineering to obtain confidential information through targeted victimization. It is distinguished by its use of social cues and personalized information to target specific victims. Previous work on resilience to spear phishing has focused on convenience samples, with a disproportionate focus on students. In contrast, here, we report on an evaluation of a high school community. We engaged 57 high school students and faculty members (12 high school students, 45 staff members) as participants in research utilizing signal detection theory (SDT). Through scenario-based analysis, participants tasked with distinguishing phishing emails from authentic emails. The results revealed an overconfidence bias in self-detection from the participants, regardless of their technical background. These findings are critical for evaluating the decision-making of underrepresented populations and protecting people from potential spear phishing attacks by examining human susceptibility.
CRAug 16, 2019
All About Phishing: Exploring User Research through a Systematic Literature ReviewSanchari Das, Andrew Kim, Zachary Tingle et al.
Phishing is a well-known cybersecurity attack that has rapidly increased in recent years. It poses legitimate risks to businesses, government agencies, and all users due to sensitive data breaches, subsequent financial and productivity losses, and social and personal inconvenience. Often, these attacks use social engineering techniques to deceive end-users, indicating the importance of user-focused studies to help prevent future attacks. We provide a detailed overview of phishing research that has focused on users by conducting a systematic literature review of peer-reviewed academic papers published in ACM Digital Library. Although published work on phishing appears in this data set as early as 2004, we found that of the total number of papers on phishing (N = 367) only 13.9% (n = 51) focus on users by employing user study methodologies such as interviews, surveys, and in-lab studies. Even within this small subset of papers, we note a striking lack of attention to reporting important information about methods and participants (e.g., the number and nature of participants), along with crucial recruitment biases in some of the research.