Yunan Chen

CLDec 30, 2022

Inconsistencies in Masked Language Models

Tom Young, Yunan Chen, Yang You

Learning to predict masked tokens in a sequence has been shown to be a helpful pretraining objective for powerful language models such as PaLM2. After training, such masked language models (MLMs) can provide distributions of tokens in the masked positions in a sequence. However, this paper shows that distributions corresponding to different masking patterns can demonstrate considerable inconsistencies, i.e., they cannot be derived from a coherent joint distribution when considered together. This fundamental flaw in MLMs can lead to self-contradictory behaviors during inference. On various benchmark datasets including MMLU, MLMs can give different predictions to the same input question. From BERT-base to UL2-20B, we show that such inconsistencies exist ubiquitously in MLMs of diverse sizes and configurations. In light of our observations, we further propose an inference-time strategy for MLMs called Ensemble of Conditionals. It jointly considers a selected range of inconsistent conditionals directly produced by the MLM for the final prediction, which often leads to considerable accuracy improvement.

26.9HCMar 27

We Need Granular Sharing of De-Identified Data-But Will Patients Engage? Investigating Health System Leaders' and Patients' Perspectives on A Patient-Controlled Data-Sharing Platform

Xi Lu, Di Hu, An T. Nguyen et al.

Patient-controlled data-sharing systems are increasingly promoted as a way to empower patients with greater autonomy over their health data. Yet it remains unclear how different stakeholders, especially patients and health system leaders, perceive the benefits and challenges of enabling granular control over the sharing of de-identified medical data for research. To address this gap, we developed a high-fidelity prototype of a patient-controlled, web-based consent platform and conducted a two-phase mixed-methods study:semi-structured interviews with 16 health system leaders and a survey with 523 patient participants. While both groups appreciated the potential of such a platform to enhance transparency and autonomy, their views diverged in meaningful ways. Leaders viewed transparency and granular control through the lens of informed consent and institutional ethics, whereas patients interpreted these factors as safeguards against potential risks and uncertainties. Our findings underscore critical tensions such as individual control and research integrity. We offer design implications for building trustworthy, context-aware systems that support flexible granularity, provide ongoing benefit-centered transparency, and adapt to diverse literacy and privacy needs.

Yunan Chen

2 Papers