LGApr 1, 2025
Global explainability of a deep abstaining classifierSayera Dhaubhadel, Jamaludin Mohd-Yusof, Benjamin H. McMahon et al.
We present a global explainability method to characterize sources of errors in the histology prediction task of our real-world multitask convolutional neural network (MTCNN)-based deep abstaining classifier (DAC), for automated annotation of cancer pathology reports from NCI-SEER registries. Our classifier was trained and evaluated on 1.04 million hand-annotated samples and makes simultaneous predictions of cancer site, subsite, histology, laterality, and behavior for each report. The DAC framework enables the model to abstain on ambiguous reports and/or confusing classes to achieve a target accuracy on the retained (non-abstained) samples, but at the cost of decreased coverage. Requiring 97% accuracy on the histology task caused our model to retain only 22% of all samples, mostly the less ambiguous and common classes. Local explainability with the GradInp technique provided a computationally efficient way of obtaining contextual reasoning for thousands of individual predictions. Our method, involving dimensionality reduction of approximately 13000 aggregated local explanations, enabled global identification of sources of errors as hierarchical complexity among classes, label noise, insufficient information, and conflicting evidence. This suggests several strategies such as exclusion criteria, focused annotation, and reduced penalties for errors involving hierarchically related classes to iteratively improve our DAC in this complex real-world implementation.
CYDec 3, 2024
Investigating the importance of county-level characteristics in opioid-related mortality across the United StatesAndrew Deas, Adam Spannaus, Dakotah D. Maguire et al.
The opioid crisis remains a critical public health challenge in the United States. Despite national efforts which reduced opioid prescribing rates by nearly 45\% between 2011 and 2021, opioid-related overdose deaths more than tripled during the same period. This alarming trend reflects a major shift in the crisis, with illegal opioids now driving the majority of overdose deaths instead of prescription opioids. Although much attention has been given to supply-side factors fueling this transition, the underlying structural conditions that perpetuate and exacerbate opioid misuse remain less understood. Moreover, the COVID-19 pandemic intensified the opioid crisis through widespread social isolation and record-high unemployment; consequently, understanding the underlying drivers of this epidemic has become even more crucial in recent years. To address this need, our study examines the correlation between opioid-related mortality and thirteen county-level characteristics related to population traits, economic stability, and infrastructure. Leveraging a nationwide county-level dataset spanning consecutive years from 2010 to 2022, this study integrates empirical insights from exploratory data analysis with feature importance metrics derived from machine learning models. Our findings highlight critical regional characteristics strongly correlated with opioid-related mortality, emphasizing their potential roles in worsening the epidemic when their levels are high and mitigating it when their levels are low.
MLMay 15, 2023
Topological Interpretability for Deep-LearningAdam Spannaus, Heidi A. Hanson, Lynne Penberthy et al.
With the growing adoption of AI-based systems across everyday life, the need to understand their decision-making mechanisms is correspondingly increasing. The level at which we can trust the statistical inferences made from AI-based decision systems is an increasing concern, especially in high-risk systems such as criminal justice or medical diagnosis, where incorrect inferences may have tragic consequences. Despite their successes in providing solutions to problems involving real-world data, deep learning (DL) models cannot quantify the certainty of their predictions. These models are frequently quite confident, even when their solutions are incorrect. This work presents a method to infer prominent features in two DL classification models trained on clinical and non-clinical text by employing techniques from topological and geometric data analysis. We create a graph of a model's feature space and cluster the inputs into the graph's vertices by the similarity of features and prediction statistics. We then extract subgraphs demonstrating high-predictive accuracy for a given label. These subgraphs contain a wealth of information about features that the DL model has recognized as relevant to its decisions. We infer these features for a given label using a distance metric between probability measures, and demonstrate the stability of our method compared to the LIME and SHAP interpretability methods. This work establishes that we may gain insights into the decision mechanism of a DL model. This method allows us to ascertain if the model is making its decisions based on information germane to the problem or identifies extraneous patterns within the data.
MTRL-SCIJan 14, 2021
Materials Fingerprinting ClassificationAdam Spannaus, Kody J. H. Law, Piotr Luszczek et al.
Significant progress in many classes of materials could be made with the availability of experimentally-derived large datasets composed of atomic identities and three-dimensional coordinates. Methods for visualizing the local atomic structure, such as atom probe tomography (APT), which routinely generate datasets comprised of millions of atoms, are an important step in realizing this goal. However, state-of-the-art APT instruments generate noisy and sparse datasets that provide information about elemental type, but obscure atomic structures, thus limiting their subsequent value for materials discovery. The application of a materials fingerprinting process, a machine learning algorithm coupled with topological data analysis, provides an avenue by which here-to-fore unprecedented structural information can be extracted from an APT dataset. As a proof of concept, the material fingerprint is applied to high-entropy alloy APT datasets containing body-centered cubic (BCC) and face-centered cubic (FCC) crystal structures. A local atomic configuration centered on an arbitrary atom is assigned a topological descriptor, with which it can be characterized as a BCC or FCC lattice with near perfect accuracy, despite the inherent noise in the dataset. This successful identification of a fingerprint is a crucial first step in the development of algorithms which can extract more nuanced information, such as chemical ordering, from existing datasets of complex materials.
MLDec 4, 2018
A Stable Cardinality Distance for Topological ClassificationVasileios Maroulas, Cassie Putman Micucci, Adam Spannaus
This work incorporates topological features via persistence diagrams to classify point cloud data arising from materials science. Persistence diagrams are multisets summarizing the connectedness and holes of given data. A new distance on the space of persistence diagrams generates relevant input features for a classification algorithm for materials science data. This distance measures the similarity of persistence diagrams using the cost of matching points and a regularization term corresponding to cardinality differences between diagrams. Establishing stability properties of this distance provides theoretical justification for the use of the distance in comparisons of such diagrams. The classification scheme succeeds in determining the crystal structure of materials on noisy and sparse data retrieved from synthetic atom probe tomography experiments.