Yue You

HC
4papers
98citations
Novelty35%
AI Score20

4 Papers

STMay 12, 2021
Estimation of population size based on capture recapture designs and evaluation of the estimation reliability

Yue You, Mark van der Laan, Philip Collender et al.

We propose a modern method to estimate population size based on capture-recapture designs of K samples. The observed data is formulated as a sample of n i.i.d. K-dimensional vectors of binary indicators, where the k-th component of each vector indicates the subject being caught by the k-th sample, such that only subjects with nonzero capture vectors are observed. The target quantity is the unconditional probability of the vector being nonzero across both observed and unobserved subjects. We cover models assuming a single constraint (identification assumption) on the K-dimensional distribution such that the target quantity is identified and the statistical model is unrestricted. We present solutions for linear and non-linear constraints commonly assumed to identify capture-recapture models, including no K-way interaction in linear and log-linear models, independence or conditional independence. We demonstrate that the choice of constraint has a dramatic impact on the value of the estimand, showing that it is crucial that the constraint is known to hold by design. For the commonly assumed constraint of no K-way interaction in a log-linear model, the statistical target parameter is only defined when each of the $2^K - 1$ observable capture patterns is present, and therefore suffers from the curse of dimensionality. We propose a targeted MLE based on undersmoothed lasso model to smooth across the cells while targeting the fit towards the single valued target parameter of interest. For each identification assumption, we provide simulated inference and confidence intervals to assess the performance on the estimator under correct and incorrect identifying assumptions. We apply the proposed method, alongside existing estimators, to estimate prevalence of a parasitic infection using multi-source surveillance data from a region in southwestern China, under the four identification assumptions.

HCJan 12, 2021
Self-Diagnosis through AI-enabled Chatbot-based Symptom Checkers: User Experiences and Design Considerations

Yue You, Xinning Gui

Recently, there has been a growing interest in developing AI-enabled chatbot-based symptom checker (CSC) apps in the healthcare market. CSC apps provide potential diagnoses for users and assist them with self-triaging based on Artificial Intelligence (AI) techniques using human-like conversations. Despite the popularity of such CSC apps, little research has been done to investigate their functionalities and user experiences. To do so, we conducted a feature review, a user review analysis, and an interview study. We found that the existing CSC apps lack the functions to support the whole diagnostic process of an offline medical visit. We also found that users perceive the current CSC apps to lack support for a comprehensive medical history, flexible symptom input, comprehensible questions, and diverse diseases and user groups. Based on these results, we derived implications for the future features and conversational design of CSC apps.

HCJan 12, 2021
The Medical Authority of AI: A Study of AI-enabled Consumer-facing Health Technology

Yue You, Yubo Kou, Xianghua Ding et al.

Recently, consumer-facing health technologies such as Artificial Intelligence (AI)-based symptom checkers (AISCs) have sprung up in everyday healthcare practice. AISCs solicit symptom information from users and provide medical suggestions and possible diagnoses, a responsibility that people usually entrust with real-person authorities such as physicians and expert patients. Thus, the advent of AISCs begs a question of whether and how they transform the notion of medical authority in everyday healthcare practice. To answer this question, we conducted an interview study with thirty AISC users. We found that users assess the medical authority of AISCs using various factors including automated decisions and interaction design patterns of AISC apps, associations with established medical authorities like hospitals, and comparisons with other health technologies. We reveal how AISCs are used in healthcare delivery, discuss how AI transforms conventional understandings of medical authority, and derive implications for designing AI-enabled health technology.

IRJan 7, 2020
Machine-learning classifiers for logographic name matching in public health applications: approaches for incorporating phonetic, visual, and keystroke similarity in large-scale probabilistic record linkage

Philip A. Collender, Zhiyue Tom Hu, Charles Li et al.

Approximate string-matching methods to account for complex variation in highly discriminatory text fields, such as personal names, can enhance probabilistic record linkage. However, discriminating between matching and non-matching strings is challenging for logographic scripts, where similarities in pronunciation, appearance, or keystroke sequence are not directly encoded in the string data. We leverage a large Chinese administrative dataset with known match status to develop logistic regression and Xgboost classifiers integrating measures of visual, phonetic, and keystroke similarity to enhance identification of potentially-matching name pairs. We evaluate three methods of leveraging name similarity scores in large-scale probabilistic record linkage, which can adapt to varying match prevalence and information in supporting fields: (1) setting a threshold score based on predicted quality of name-matching across all record pairs; (2) setting a threshold score based on predicted discriminatory power of the linkage model; and (3) using empirical score distributions among matches and nonmatches to perform Bayesian adjustment of matching probabilities estimated from exact-agreement linkage. In experiments on holdout data, as well as data simulated with varying name error rates and supporting fields, a logistic regression classifier incorporated via the Bayesian method demonstrated marked improvements over exact-agreement linkage with respect to discriminatory power, match probability estimation, and accuracy, reducing the total number of misclassified record pairs by 21% in test data and up to an average of 93% in simulated datasets. Our results demonstrate the value of incorporating visual, phonetic, and keystroke similarity for logographic name matching, as well as the promise of our Bayesian approach to leverage name-matching within large-scale record linkage.