Hannah Li

CY
h-index67
8papers
57citations
Novelty50%
AI Score47

8 Papers

20.0CYJun 2
Dropping Standardized Testing for Admissions Trades Off Information and Access

Nikhil Garg, Hannah Li, Faidra Monachou

We study the role of information and access in capacity-constrained selection problems with fairness concerns. We develop a statistical discrimination framework, where each applicant has multiple features and is potentially strategic. The model formalizes the trade-off between the (potentially positive) informational role of a feature and its (negative) exclusionary nature when members of different social groups have unequal access to this feature. Our framework finds a natural application to policy debates on dropping standardized testing in admissions. Our primary takeaway is that the decision to drop a feature (such as test scores) cannot be made without the joint context of the information provided by other features and how the requirement affects the applicant pool composition. Dropping a feature may exacerbate disparities by decreasing the amount of information available for each applicant, especially those from non-traditional backgrounds. However, in the presence of access barriers to a feature, the interaction between the informational environment and the effect of access barriers on the applicant pool size becomes highly complex. Furthermore, we consider an extension with two schools and costly tests, where strategic students decide whether to take the test or not. Our theoretical results reveal that the students' test-taking behavior can be non-monotonic. We characterize the two-school policy equilibria and show that each school's optimal decision to drop the test critically depends on the other school's test policy. Finally, using calibrated simulations, we demonstrate the presence of practical instances where the decision to eliminate standardized testing improves or worsens all metrics.

81.6MEApr 15
Deployment of AI-Assisted Interventions: Capacity Constraints and Noisy Compliance

Carri W. Chan, Yi Han, Hannah Li et al.

AI tools increasingly guide targeted interventions in healthcare, education, and recruiting. Algorithms score individuals, trigger outreach to those above a threshold (e.g., high-risk or high-value), and encourage them to request service; then providers deliver service to those who request. Standard practice sets the threshold and selects the algorithm to maximize predictive accuracy, assuming that better predictions yield better outcomes. We show that this approach is suboptimal when limited service capacity and probabilistic behavioral responses influence who receives service. In such settings, the optimal score threshold must balance two effects: ensuring all capacity is filled (utilization) and ensuring high-value individuals are served despite competition between requests (cannibalization). We characterize the optimal threshold and prove that policies based solely on predictive accuracy are generally suboptimal. Further, because optimal thresholds vary with service capacity, algorithm selection metrics like AUC, which weight all thresholds equally, are misaligned with operational performance. We introduce a new metric--Operational AUC (OpAUC)--and show it leads to optimal algorithm selection. Finally, we conduct a case study on sepsis early warning data and illustrate the magnitude of improvement that can be achieved from improved threshold and algorithm selection.

GTDec 30, 2023
Matching of Users and Creators in Two-Sided Markets with Departures

Daniel Huttenlocher, Hannah Li, Liang Lyu et al.

Many online platforms of today, including social media sites, are two-sided markets bridging content creators and users. Most of the existing literature on platform recommendation algorithms largely focuses on user preferences and decisions, and does not simultaneously address creator incentives. We propose a model of content recommendation that explicitly focuses on the dynamics of user-content matching, with the novel property that both users and creators may leave the platform permanently if they do not experience sufficient engagement. In our model, each player decides to participate at each time step based on utilities derived from the current match: users based on alignment of the recommended content with their preferences, and creators based on their audience size. We show that a user-centric greedy algorithm that does not consider creator departures can result in arbitrarily poor total engagement, relative to an algorithm that maximizes total engagement while accounting for two-sided departures. Moreover, in stark contrast to the case where only users or only creators leave the platform, we prove that with two-sided departures, approximating maximum total engagement within any constant factor is NP-hard. We present two practical algorithms, one with performance guarantees under mild assumptions on user preferences, and another that tends to outperform algorithms that ignore two-sided departures in practice.

CYSep 23, 2025
A Mega-Study of Digital Twins Reveals Strengths, Weaknesses and Opportunities for Further Improvement

Tianyi Peng, George Gui, Daniel J. Merlau et al.

Digital representations of individuals ("digital twins") promise to transform social science and decision-making. Yet it remains unclear whether such twins truly mirror the people they emulate. We conducted 19 preregistered studies with a representative U.S. panel and their digital twins, each constructed from rich individual-level data, enabling direct comparisons between human and twin behavior across a wide range of domains and stimuli (including never-seen-before ones). Twins reproduced individual responses with 75% accuracy and seemingly low correlation with human answers (approximately 0.2). However, this apparently high accuracy was no higher than that achieved by generic personas based on demographics only. In contrast, correlation improved when twins incorporated detailed personal information, even outperforming traditional machine learning benchmarks that require additional data. Twins exhibited systematic strengths and weaknesses - performing better in social and personality domains, but worse in political ones - and were more accurate for participants with higher education, higher income, and moderate political views and religious attendance. Together, these findings delineate both the promise and the current limits of digital twins: they capture some relative differences among individuals but not yet the unique judgments of specific people. All data and code are publicly available to support the further development and evaluation of digital twin pipelines.

CYMay 9, 2024
Measuring Strategization in Recommendation: Users Adapt Their Behavior to Shape Future Content

Sarah H. Cen, Andrew Ilyas, Jennifer Allen et al.

Most modern recommendation algorithms are data-driven: they generate personalized recommendations by observing users' past behaviors. A common assumption in recommendation is that how a user interacts with a piece of content (e.g., whether they choose to "like" it) is a reflection of the content, but not of the algorithm that generated it. Although this assumption is convenient, it fails to capture user strategization: that users may attempt to shape their future recommendations by adapting their behavior to the recommendation algorithm. In this work, we test for user strategization by conducting a lab experiment and survey. To capture strategization, we adopt a model in which strategic users select their engagement behavior based not only on the content, but also on how their behavior affects downstream recommendations. Using a custom music player that we built, we study how users respond to different information about their recommendation algorithm as well as to different incentives about how their actions affect downstream outcomes. We find strong evidence of strategization across outcome metrics, including participants' dwell time and use of "likes." For example, participants who are told that the algorithm mainly pays attention to "likes" and "dislikes" use those functions 1.9x more than participants told that the algorithm mainly pays attention to dwell time. A close analysis of participant behavior (e.g., in response to our incentive conditions) rules out experimenter demand as the main driver of these trends. Further, in our post-experiment survey, nearly half of participants self-report strategizing "in the wild," with some stating that they ignore content they actually like to avoid over-recommendation of that content in the future. Together, our findings suggest that user strategization is common and that platforms cannot ignore the effect of their algorithms on user behavior.

LGSep 18, 2018
Exploration vs. Exploitation in Team Formation

Ramesh Johari, Vijay Kamble, Anilesh K. Krishnaswamy et al.

An online labor platform faces an online learning problem in matching workers with jobs and using the performance on these jobs to create better future matches. This learning problem is complicated by the rise of complex tasks on these platforms, such as web development and product design, that require a team of workers to complete. The success of a job is now a function of the skills and contributions of all workers involved, which may be unknown to both the platform and the client who posted the job. These team matchings result in a structured correlation between what is known about the individuals and this information can be utilized to create better future matches. We analyze two natural settings where the performance of a team is dictated by its strongest and its weakest member, respectively. We find that both problems pose an exploration-exploitation tradeoff between learning the performance of untested teams and repeating previously tested teams that resulted in a good performance. We establish fundamental regret bounds and design near-optimal algorithms that uncover several insights into these tradeoffs.

CRJun 15, 2017
Horcrux: A Password Manager for Paranoids

Hannah Li, David Evans

Vulnerabilities in password managers are unremitting because current designs provide large attack surfaces, both at the client and server. We describe and evaluate Horcrux, a password manager that is designed holistically to minimize and decentralize trust, while retaining the usability of a traditional password manager. The prototype Horcrux client, implemented as a Firefox add-on, is split into two components, with code that has access to the user's master's password and any key material isolated into a small auditable component, separate from the complexity of managing the user interface. Instead of exposing actual credentials to the DOM, a dummy username and password are autofilled by the untrusted component. The trusted component intercepts and modifies POST requests before they are encrypted and sent over the network. To avoid trusting a centralized store, stored credentials are secret-shared over multiple servers. To provide domain and username privacy, while maintaining resilience to off-line attacks on a compromised password store, we incorporate cuckoo hashing in a way that ensures an attacker cannot determine if a guessed master password is correct. Our approach only works for websites that do not manipulate entered credentials in the browser client, so we conducted a large-scale experiment that found the technique appears to be compatible with over 98% of tested login forms.

CRJun 11, 2017
Decentralized Certificate Authorities

Bargav Jayaraman, Hannah Li, David Evans

The security of TLS depends on trust in certificate authorities, and that trust stems from their ability to protect and control the use of a private signing key. The signing key is the key asset of a certificate authority (CA), and its value is based on trust in the corresponding public key which is primarily distributed by browser vendors. Compromise of a CA private key represents a single point-of-failure that could have disastrous consequences, so CAs go to great lengths to attempt to protect and control the use of their private keys. Nevertheless, keys are sometimes compromised and may be misused accidentally or intentionally by insiders. We propose splitting a CA's private key among multiple parties, and producing signatures using a generic secure multi-party computation protocol that never exposes the actual signing key. This could be used by a single CA to reduce the risk that its signing key would be compromised or misused. It could also enable new models for certificate generation, where multiple CAs would need to agree and cooperate before a new certificate can be generated, or even where certificate generation would require cooperation between a CA and the certificate recipient (subject). Although more efficient solutions are possible with custom protocols, we demonstrate the feasibility of implementing a decentralized CA using a generic two-party secure computation protocol with an evaluation of a prototype implementation that uses secure two-party computation to generate certificates signed using ECDSA on curve secp192k1.