LGJun 9, 2021
Taxonomy of Machine Learning Safety: A Survey and PrimerSina Mohseni, Haotao Wang, Zhiding Yu et al.
The open-world deployment of Machine Learning (ML) algorithms in safety-critical applications such as autonomous vehicles needs to address a variety of ML vulnerabilities such as interpretability, verifiability, and performance limitations. Research explores different approaches to improve ML dependability by proposing new models and training techniques to reduce generalization error, achieve domain adaptation, and detect outlier examples and adversarial attacks. However, there is a missing connection between ongoing ML research and well-established safety principles. In this paper, we present a structured and comprehensive review of ML techniques to improve the dependability of ML algorithms in uncontrolled open-world settings. From this review, we propose the Taxonomy of ML Safety that maps state-of-the-art ML techniques to key engineering safety strategies. Our taxonomy of ML safety presents a safety-oriented categorization of ML techniques to provide guidance for improving dependability of the ML design and development. The proposed taxonomy can serve as a safety checklist to aid designers in improving coverage and diversity of safety strategies employed in any given ML system.
CVJun 7, 2021
Shifting Transformation Learning for Out-of-Distribution DetectionSina Mohseni, Arash Vahdat, Jay Yadawa
Detecting out-of-distribution (OOD) samples plays a key role in open-world and safety-critical applications such as autonomous systems and healthcare. Recently, self-supervised representation learning techniques (via contrastive learning and pretext learning) have shown effective in improving OOD detection. However, one major issue with such approaches is the choice of shifting transformations and pretext tasks which depends on the in-domain distribution. In this paper, we propose a simple framework that leverages a shifting transformation learning setting for learning multiple shifted representations of the training set for improved OOD detection. To address the problem of selecting optimal shifting transformation and pretext tasks, we propose a simple mechanism for automatically selecting the transformations and modulating their effect on representation learning without requiring any OOD training samples. In extensive experiments, we show that our simple framework outperforms state-of-the-art OOD detection models on several image datasets. We also characterize the criteria for a desirable OOD detector for real-world applications and demonstrate the efficacy of our proposed technique against state-of-the-art OOD detection techniques.
IRJul 24, 2020
Machine Learning Explanations to Prevent Overtrust in Fake News DetectionSina Mohseni, Fan Yang, Shiva Pentyala et al.
Combating fake news and misinformation propagation is a challenging task in the post-truth era. News feed and search algorithms could potentially lead to unintentional large-scale propagation of false and fabricated information with users being exposed to algorithmically selected false content. Our research investigates the effects of an Explainable AI assistant embedded in news review platforms for combating the propagation of fake news. We design a news reviewing and sharing interface, create a dataset of news stories, and train four interpretable fake news detection algorithms to study the effects of algorithmic transparency on end-users. We present evaluation results and analysis from multiple controlled crowdsourced studies. For a deeper understanding of Explainable AI systems, we discuss interactions between user engagement, mental model, trust, and performance measures in the process of explaining. The study results indicate that explanations helped participants to build appropriate mental models of the intelligent assistants in different conditions and adjust their trust accordingly for model limitations.
LGDec 20, 2019
Practical Solutions for Machine Learning Safety in Autonomous VehiclesSina Mohseni, Mandar Pitale, Vasu Singh et al.
Autonomous vehicles rely on machine learning to solve challenging tasks in perception and motion planning. However, automotive software safety standards have not fully evolved to address the challenges of machine learning safety such as interpretability, verification, and performance limitations. In this paper, we review and organize practical machine learning safety techniques that can complement engineering safety for machine learning based software in autonomous vehicles. Our organization maps safety strategies to state-of-the-art machine learning techniques in order to enhance dependability and safety of machine learning algorithms. We also discuss security limitations and user experience aspects of machine learning components in autonomous vehicles.
CYJul 8, 2019
XFake: Explainable Fake News Detector with VisualizationsFan Yang, Shiva K. Pentyala, Sina Mohseni et al.
In this demo paper, we present the XFake system, an explainable fake news detector that assists end-users to identify news credibility. To effectively detect and interpret the fakeness of news items, we jointly consider both attributes (e.g., speaker) and statements. Specifically, MIMIC, ATTN and PERT frameworks are designed, where MIMIC is built for attribute analysis, ATTN is for statement semantic analysis and PERT is for statement linguistic analysis. Beyond the explanations extracted from the designed frameworks, relevant supporting examples as well as visualization are further provided to facilitate the interpretation. Our implemented system is demonstrated on a real-world dataset crawled from PolitiFact, where thousands of verified political news have been collected.
LGMay 19, 2019
Predicting Model Failure using Saliency Maps in Autonomous Driving SystemsSina Mohseni, Akshay Jagadeesh, Zhangyang Wang
While machine learning systems show high success rate in many complex tasks, research shows they can also fail in very unexpected situations. Rise of machine learning products in safety-critical industries cause an increase in attention in evaluating model robustness and estimating failure probability in machine learning systems. In this work, we propose a design to train a student model -- a failure predictor -- to predict the main model's error for input instances based on their saliency map. We implement and review the preliminary results of our failure predictor model on an autonomous vehicle steering control system as an example of safety-critical applications.
SIApr 4, 2019
Open Issues in Combating Fake News: Interpretability as an OpportunitySina Mohseni, Eric Ragan, Xia Hu
Combating fake news needs a variety of defense methods. Although rumor detection and various linguistic analysis techniques are common methods to detect false content in social media, there are other feasible mitigation approaches that could be explored in the machine learning community. In this paper, we present open issues and opportunities in fake news research that need further attention. We first review different stages of the news life cycle in social media and discuss core vulnerability issues for news feed algorithms in propagating fake news content with three examples. We then discuss how complexity and unclarity of the fake news problem limit the advancements in this field. Lastly, we present research opportunities from interpretable machine learning to mitigate fake news problems with 1) interpretable fake news detection and 2) transparent news feed algorithms. We propose three dimensions of interpretability consisting of algorithmic interpretability, human interpretability, and the inclusion of supporting evidence that can benefit fake news mitigation methods in different ways.
HCNov 28, 2018
A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI SystemsSina Mohseni, Niloofar Zarei, Eric D. Ragan
The need for interpretable and accountable intelligent systems grows along with the prevalence of artificial intelligence applications used in everyday life. Explainable intelligent systems are designed to self-explain the reasoning behind system decisions and predictions, and researchers from different disciplines work together to define, design, and evaluate interpretable systems. However, scholars from different disciplines focus on different objectives and fairly independent topics of interpretable machine learning research, which poses challenges for identifying appropriate design and evaluation methodology and consolidating knowledge across efforts. To this end, this paper presents a survey and framework intended to share knowledge and experiences of XAI design and evaluation methods across multiple disciplines. Aiming to support diverse design goals and evaluation methods in XAI research, after a thorough review of XAI related papers in the fields of machine learning, visualization, and human-computer interaction, we present a categorization of interpretable machine learning design goals and evaluation methods to show a mapping between design goals for different XAI user groups and their evaluation methods. From our findings, we develop a framework with step-by-step design guidelines paired with evaluation methods to close the iterative design and evaluation cycles in multidisciplinary XAI teams. Further, we provide summarized ready-to-use tables of evaluation methods and recommendations for different goals in XAI research.
HCJan 16, 2018
ProvThreads: Analytic Provenance Visualization and SegmentationSina Mohseni, Alyssa Pena, Eric D. Ragan
Our work aims to generate visualizations to enable meta-analysis of analytic provenance and aid better understanding of analysts' strategies during exploratory text analysis. We introduce ProvThreads, a visual analytics approach that incorporates interactive topic modeling outcomes to illustrate relationships between user interactions and the data topics under investigation. ProvThreads uses a series of continuous analysis paths called topic threads to demonstrate both topic coverage and the progression of an investigation over time. As an analyst interacts with different pieces of data during the analysis, interactions are logged and used to track user interests in topics over time. A line chart shows different amounts of interest in multiple topics over the duration of the analysis. We discuss how different configurations of ProvThreads can be used to reveal changes in focus throughout an analysis.
HCJan 16, 2018
Analytic Provenance Datasets: A Data Repository of Human Analysis Activity and Interaction LogsSina Mohseni, Andrew Pachuilo, Ehsanul Haque Nirjhar et al.
We present an analytic provenance data repository that can be used to study human analysis activity, thought processes, and software interaction with visual analysis tools during exploratory data analysis. We conducted a series of user studies involving exploratory data analysis scenario with textual and cyber security data. Interactions logs, think-alouds, videos and all coded data in this study are available online for research purposes. Analysis sessions are segmented in multiple sub-task steps based on user think-alouds, video and audios captured during the studies. These analytic provenance datasets can be used for research involving tools and techniques for analyzing interaction logs and analysis history. By providing high-quality coded data along with interaction logs, it is possible to compare algorithmic data processing techniques to the ground-truth records of analysis history.
HCJan 16, 2018
A Human-Grounded Evaluation Benchmark for Local Explanations of Machine LearningSina Mohseni, Jeremy E. Block, Eric D. Ragan
Research in interpretable machine learning proposes different computational and human subject approaches to evaluate model saliency explanations. These approaches measure different qualities of explanations to achieve diverse goals in designing interpretable machine learning systems. In this paper, we propose a human attention benchmark for image and text domains using multi-layer human attention masks aggregated from multiple human annotators. We then present an evaluation study to evaluate model saliency explanations obtained using Grad-cam and LIME techniques. We demonstrate our benchmark's utility for quantitative evaluation of model explanations by comparing it with human subjective ratings and ground-truth single-layer segmentation masks evaluations. Our study results show that our threshold agnostic evaluation method with the human attention baseline is more effective than single-layer object segmentation masks to ground truth. Our experiments also reveal user biases in the subjective rating of model saliency explanations.