Andrew Mao

CL
h-index15
4papers
115citations
Novelty53%
AI Score27

4 Papers

MED-PHNov 13, 2023
Bias-Reduced Neural Networks for Parameter Estimation in Quantitative MRI

Andrew Mao, Sebastian Flassbeck, Jakob Assländer

Purpose: To develop neural network (NN)-based quantitative MRI parameter estimators with minimal bias and a variance close to the Cramér-Rao bound. Theory and Methods: We generalize the mean squared error loss to control the bias and variance of the NN's estimates, which involves averaging over multiple noise realizations of the same measurements during training. Bias and variance properties of the resulting NNs are studied for two neuroimaging applications. Results: In simulations, the proposed strategy reduces the estimates' bias throughout parameter space and achieves a variance close to the Cramér-Rao bound. In vivo, we observe good concordance between parameter maps estimated with the proposed NNs and traditional estimators, such as non-linear least-squares fitting, while state-of-the-art NNs show larger deviations. Conclusion: The proposed NNs have greatly reduced bias compared to those trained using the mean squared error and offer significantly improved computational efficiency over traditional estimators with comparable or better accuracy.

CLNov 15, 2022
Cheater's Bowl: Human vs. Computer Search Strategies for Open-Domain Question Answering

Wanrong He, Andrew Mao, Jordan Boyd-Graber

For humans and computers, the first step in answering an open-domain question is retrieving a set of relevant documents from a large corpus. However, the strategies that computers use fundamentally differ from those of humans. To better understand these differences, we design a gamified interface for data collection -- Cheater's Bowl -- where a human answers complex questions with access to both traditional and modern search tools. We collect a dataset of human search sessions, analyze human search strategies, and compare them to state-of-the-art multi-hop QA models. Humans query logically, apply dynamic search chains, and use world knowledge to boost searching. We demonstrate how human queries can improve the accuracy of existing systems and propose improving the future design of QA models.

CLJan 29, 2024
Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis

Zongxia Li, Andrew Mao, Daniel Stephens et al.

Topic models are a popular tool for understanding text collections, but their evaluation has been a point of contention. Automated evaluation metrics such as coherence are often used, however, their validity has been questioned for neural topic models (NTMs) and can overlook a models benefits in real world applications. To this end, we conduct the first evaluation of neural, supervised and classical topic models in an interactive task based setting. We combine topic models with a classifier and test their ability to help humans conduct content analysis and document annotation. From simulated, real user and expert pilot studies, the Contextual Neural Topic Model does the best on cluster evaluation metrics and human evaluations; however, LDA is competitive with two other NTMs under our simulated experiment and user study results, contrary to what coherence scores suggest. We show that current automated metrics do not provide a complete picture of topic modeling capabilities, but the right choice of NTMs can be better than classical models on practical task.

IROct 29, 2013
Capturing Variation and Uncertainty in Human Judgment

Andrew Mao, Hossein Azari Soufiani, Yiling Chen et al.

The well-studied problem of statistical rank aggregation has been applied to comparing sports teams, information retrieval, and most recently to data generated by human judgment. Such human-generated rankings may be substantially different from traditional statistical ranking data. In this work, we show that a recently proposed generalized random utility model reveals distinctive patterns in human judgment across three different domains, and provides a succinct representation of variance in both population preferences and imperfect perception. In contrast, we also show that classical statistical ranking models fail to capture important features from human-generated input. Our work motivates the use of more flexible ranking models for representing and describing the collective preferences or decision-making of human participants.