Harry Hochheiser

CL
h-index15
7papers
1,532citations
Novelty27%
AI Score40

7 Papers

LGApr 20Code
IDOBE: Infectious Disease Outbreak forecasting Benchmark Ecosystem

Aniruddha Adiga, Jingyuan Chou, Anshul Chiranth et al.

Epidemic forecasting has become an integral part of real-time infectious disease outbreak response. While collaborative ensembles composed of statistical and machine learning models have become the norm for real-time forecasting, standardized benchmark datasets for evaluating such methods are lacking. Further, there is limited understanding on performance of these methods for novel outbreaks with limited historical data. In this paper, we propose IDOBE, a curated collection of epidemiological time series focused on outbreak forecasting. IDOBE compiles from multiple data repositories spanning over a century of surveillance and across U.S. states and global locations. We perform derivative-based segmentation to generate over 10,000 outbreaks covering multiple outcomes such as cases and hospitalizations for 13 diseases. We consider a variety of information-theoretic and distributional measures to quantify the epidemiological diversity of the dataset. Finally, we perform multi-horizon short-term forecasting (1- to 4-week-ahead) through the progression of the outbreak using 11 baseline models and report on their performance. In addition to standard metrics such as NMSE and MAPE for point forecasts, we include probabilistic scoring rules such as Normalized Weighted Interval Score (NWIS) to quantify the performance. We find that MLP-based methods have the most robust performance, with statistical methods having a slight edge during the pre-peak phase. IDOBE dataset along with baselines are released publicly on https://github.com/NSSAC/IDOBE to enable standardized, reproducible benchmarking of outbreak forecasting methods.

CYJul 14, 2022
Areas of Strategic Visibility: Disability Bias in Biometrics

Jennifer Mankoff, Devva Kasnitz, Disability Studies et al.

This response to the RFI considers the potential for biometrics to help or harm disabled people2. Biometrics are already integrated into many aspects of daily life, from airport travel to mobile phone use. Yet many of these systems are not accessible to people who experience different kinds of disability exclusion . Different personal characteristics may impact any or all of the physical (DNA, fingerprints, face or retina) and behavioral (gesture, gait, voice) characteristics listed in the RFI as examples of biometric signals.

AIJun 16, 2022
Definition drives design: Disability models and mechanisms of bias in AI technologies

Denis Newman-Griffis, Jessica Sage Rauchberg, Rahaf Alharbi et al.

The increasing deployment of artificial intelligence (AI) tools to inform decision making across diverse areas including healthcare, employment, social benefits, and government policy, presents a serious risk for disabled people, who have been shown to face bias in AI implementations. While there has been significant work on analysing and mitigating algorithmic bias, the broader mechanisms of how bias emerges in AI applications are not well understood, hampering efforts to address bias where it begins. In this article, we illustrate how bias in AI-assisted decision making can arise from a range of specific design decisions, each of which may seem self-contained and non-biasing when considered separately. These design decisions include basic problem formulation, the data chosen for analysis, the use the AI technology is put to, and operational design elements in addition to the core algorithmic design. We draw on three historical models of disability common to different decision-making settings to demonstrate how differences in the definition of disability can lead to highly distinct decisions on each of these aspects of design, leading in turn to AI technologies with a variety of biases and downstream effects. We further show that the potential harms arising from inappropriate definitions of disability in fundamental design stages are further amplified by a lack of transparency and disabled participation throughout the AI design process. Our analysis provides a framework for critically examining AI technologies in decision-making contexts and guiding the development of a design praxis for disability-related AI analytics. We put forth this article to provide key questions to facilitate disability-led design and participatory development to produce more fair and equitable AI technologies in disability-related contexts.

CLMar 19, 2021Code
TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

Denis Newman-Griffis, Venkatesh Sivaraman, Adam Perer et al.

Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence is available from https://github.com/drgriffis/text-essence.

LGFeb 3, 2024
Online Transfer Learning for RSV Case Detection

Yiming Sun, Yuhe Gao, Runxue Bao et al.

Transfer learning has become a pivotal technique in machine learning and has proven to be effective in various real-world applications. However, utilizing this technique for classification tasks with sequential data often faces challenges, primarily attributed to the scarcity of class labels. To address this challenge, we introduce Multi-Source Adaptive Weighting (MSAW), an online multi-source transfer learning method. MSAW integrates a dynamic weighting mechanism into an ensemble framework, enabling automatic adjustment of weights based on the relevance and contribution of each source (representing historical knowledge) and target model (learning from newly acquired data). We demonstrate the effectiveness of MSAW by applying it to detect Respiratory Syncytial Virus cases within Emergency Department visits, utilizing multiple years of electronic health records from the University of Pittsburgh Medical Center. Our method demonstrates performance improvements over many baselines, including refining pre-trained models with online learning as well as three static weighting approaches, showing MSAW's capacity to integrate historical knowledge with progressively accumulated new data. This study indicates the potential of online transfer learning in healthcare, particularly for developing machine learning models that dynamically adapt to evolving situations where new data is incrementally accumulated.

CLApr 16, 2021
Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research

Denis Newman-Griffis, Jill Fain Lehman, Carolyn Rosé et al.

Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of Translational NLP, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for developing Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.

HCJul 6, 2017
An Interactive Tool for Natural Language Processing on Clinical Text

Gaurav Trivedi, Phuong Pham, Wendy Chapman et al.

Natural Language Processing (NLP) systems often make use of machine learning techniques that are unfamiliar to end-users who are interested in analyzing clinical records. Although NLP has been widely used in extracting information from clinical text, current systems generally do not support model revision based on feedback from domain experts. We present a prototype tool that allows end users to visualize and review the outputs of an NLP system that extracts binary variables from clinical text. Our tool combines multiple visualizations to help the users understand these results and make any necessary corrections, thus forming a feedback loop and helping improve the accuracy of the NLP models. We have tested our prototype in a formative think-aloud user study with clinicians and researchers involved in colonoscopy research. Results from semi-structured interviews and a System Usability Scale (SUS) analysis show that the users are able to quickly start refining NLP models, despite having very little or no experience with machine learning. Observations from these sessions suggest revisions to the interface to better support review workflow and interpretation of results.