Jennifer King

CY
h-index13
6papers
122citations
Novelty32%
AI Score44

6 Papers

4.7CYMay 20
Privacy Without Remedy: An Assessment of Data Broker Compliance with California Privacy Law

Anna-Maria Gueorguieva, Jennifer King, Apoorva Panidapu et al.

California's consumer privacy law is widely deemed to be the most protected in the United States, one of the few to expressly regulate third party entities that buy and sell consumer data (data brokers). We offer the first empirical assessment of data broker compliance with the 2018 California Consumer Privacy Act (CCPA) and the 2023 Delete Act, which requires data brokers to register with the state and report consumer rights requests metrics annually. First, we demonstrate that only 9% of 522 registered data brokers were fully compliant with transparency requirements after the Delete Act took effect, although we do identify slight improvements over time. Second, we descriptively characterize wide heterogeneity across data brokers in the volume of consumer rights requests received, with many reporting none. We bring in external business data to explore correlates associated with this variation, a challenge given the general lack of opacity into broker business practices. Third, in an audit of a sample of 250 data brokers' consumers request processes, we find that 43% make it impossible for consumers to exercise all privacy rights and 64% introduce at least one design feature that creates substantial friction into the consumer request process. Last, we show how these deficiencies stem from the decentralization of compliance decisions to brokers themselves, enforcement limitations, and regulatory ambiguity. We articulate reforms that could improve consumer privacy, transparency in broker practices, and compliance with these laws.

AIDec 18, 2025
MIMIC-RD: Can LLMs differentially diagnose rare diseases in real-world clinical settings?

Zilal Eiz AlDin, John Wu, Jeffrey Paul Fung et al.

Despite rare diseases affecting 1 in 10 Americans, their differential diagnosis remains challenging. Due to their impressive recall abilities, large language models (LLMs) have been recently explored for differential diagnosis. Existing approaches to evaluating LLM-based rare disease diagnosis suffer from two critical limitations: they rely on idealized clinical case studies that fail to capture real-world clinical complexity, or they use ICD codes as disease labels, which significantly undercounts rare diseases since many lack direct mappings to comprehensive rare disease databases like Orphanet. To address these limitations, we explore MIMIC-RD, a rare disease differential diagnosis benchmark constructed by directly mapping clinical text entities to Orphanet. Our methodology involved an initial LLM-based mining process followed by validation from four medical annotators to confirm identified entities were genuine rare diseases. We evaluated various models on our dataset of 145 patients and found that current state-of-the-art LLMs perform poorly on rare disease differential diagnosis, highlighting the substantial gap between existing capabilities and clinical needs. From our findings, we outline several future steps towards improving differential diagnosis of rare diseases.

LGNov 10, 2021Code
PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems

David Biagioni, Xiangyu Zhang, Dylan Wald et al.

We present the PowerGridworld software package to provide users with a lightweight, modular, and customizable framework for creating power-systems-focused, multi-agent Gym environments that readily integrate with existing training frameworks for reinforcement learning (RL). Although many frameworks exist for training multi-agent RL (MARL) policies, none can rapidly prototype and develop the environments themselves, especially in the context of heterogeneous (composite, multi-device) power systems where power flow solutions are required to define grid-level variables and costs. PowerGridworld is an open-source software package that helps to fill this gap. To highlight PowerGridworld's key features, we present two case studies and demonstrate learning MARL policies using both OpenAI's multi-agent deep deterministic policy gradient (MADDPG) and RLLib's proximal policy optimization (PPO) algorithms. In both cases, at least some subset of agents incorporates elements of the power flow solution at each time step as part of their reward (negative cost) structures.

CYSep 5, 2025
User Privacy and Large Language Models: An Analysis of Frontier Developers' Privacy Policies

Jennifer King, Kevin Klyman, Emily Capstick et al.

Hundreds of millions of people now regularly interact with large language models via chatbots. Model developers are eager to acquire new sources of high-quality training data as they race to improve model capabilities and win market share. This paper analyzes the privacy policies of six U.S. frontier AI developers to understand how they use their users' chats to train models. Drawing primarily on the California Consumer Privacy Act, we develop a novel qualitative coding schema that we apply to each developer's relevant privacy policies to compare data collection and use practices across the six companies. We find that all six developers appear to employ their users' chat data to train and improve their models by default, and that some retain this data indefinitely. Developers may collect and train on personal information disclosed in chats, including sensitive information such as biometric and health data, as well as files uploaded by users. Four of the six companies we examined appear to include children's chat data for model training, as well as customer data from other products. On the whole, developers' privacy policies often lack essential information about their practices, highlighting the need for greater transparency and accountability. We address the implications of users' lack of consent for the use of their chat data for model training, data security issues arising from indefinite chat data retention, and training on children's chat data. We conclude by providing recommendations to policymakers and developers to address the data privacy challenges posed by LLM-powered chatbots.

HCMay 25, 2020
Decentralized is not risk-free: Understanding public perceptions of privacy-utility trade-offs in COVID-19 contact-tracing apps

Tianshi Li, Jackie, Yang et al.

Contact-tracing apps have potential benefits in helping health authorities to act swiftly to halt the spread of COVID-19. However, their effectiveness is heavily dependent on their installation rate, which may be influenced by people's perceptions of the utility of these apps and any potential privacy risks due to the collection and releasing of sensitive user data (e.g., user identity and location). In this paper, we present a survey study that examined people's willingness to install six different contact-tracing apps after informing them of the risks and benefits of each design option (with a U.S.-only sample on Amazon Mechanical Turk, $N=208$). The six app designs covered two major design dimensions (centralized vs decentralized, basic contact tracing vs. also providing hotspot information), grounded in our analysis of existing contact-tracing app proposals. Contrary to assumptions of some prior work, we found that the majority of people in our sample preferred to install apps that use a centralized server for contact tracing, as they are more willing to allow a centralized authority to access the identity of app users rather than allowing tech-savvy users to infer the identity of diagnosed users. We also found that the majority of our sample preferred to install apps that share diagnosed users' recent locations in public places to show hotspots of infection. Our results suggest that apps using a centralized architecture with strong security protection to do basic contact tracing and providing users with other useful information such as hotspots of infection in public places may achieve a high adoption rate in the U.S.

OCNov 8, 2019
Learning-Accelerated ADMM for Distributed Optimal Power Flow

David Biagioni, Peter Graf, Xiangyu Zhang et al.

We propose a novel data-driven method to accelerate the convergence of Alternating Direction Method of Multipliers (ADMM) for solving distributed DC optimal power flow (DC-OPF) where lines are shared between independent network partitions. Using previous observations of ADMM trajectories for a given system under varying load, the method trains a recurrent neural network (RNN) to predict the converged values of dual and consensus variables. Given a new realization of system load, a small number of initial ADMM iterations is taken as input to infer the converged values and directly inject them into the iteration. We empirically demonstrate that the online injection of these values into the ADMM iteration accelerates convergence by a significant factor for partitioned 14-, 118- and 2848-bus test systems under differing load scenarios. The proposed method has several advantages: it maintains the security of private decision variables inherent in consensus ADMM; inference is fast and so may be used in online settings; RNN-generated predictions can dramatically improve time to convergence but, by construction, can never result in infeasible ADMM subproblems; it can be easily integrated into existing software implementations. While we focus on the ADMM formulation of distributed DC-OPF in this paper, the ideas presented are naturally extended to other distributed optimization problems.