Antonios Mamalakis

GEO-PH
h-index14
5papers
196citations
Novelty41%
AI Score38

5 Papers

30.8CLApr 7
SynopticBench: Evaluating Vision-Language Models on Generating Weather Forecast Discussions of the Future

Timothy B. Higgins, Antonios Mamalakis, Chirag Agarwal

Recent advances in visual-language models (VLMs) have led to significant improvements in a plethora of complex multimodal tasks like image captioning, report generation, and visual perception. However, generating text from meteorological data is highly challenging because the atmosphere is a chaotic system that is rapidly changing at various spatial and temporal scales. Given the complexity of atmospheric phenomena, it is critical to verifiably quantify the effectiveness of existing VLMs on weather forecasting data. In this work, we present SynopticBench, a high-quality dataset consisting of 1,367,041 text samples of Area Forecast Discussions created by the National Weather Service over the continental United States paired to images of 500mb geopotential height, 2 meter temperature, and 850mb wind velocity in weather forecasts. We also present Synoptic Phenomena Alignment and Coverage Evaluation (SPACE), a novel evaluation framework that can be used to effectively estimate the quality of text descriptions of synoptic weather phenomena. Extensive experiments on generating forecast discussions using state-of-the-art VLMs show the sensitivity of existing evaluation metrics in this domain and enable further exploration into synoptic weather and climate text generation.

CVMay 16, 2024
Solving the enigma: Enhancing faithfulness and comprehensibility in explanations of deep networks

Michail Mamalakis, Antonios Mamalakis, Ingrid Agartz et al.

The accelerated progress of artificial intelligence (AI) has popularized deep learning models across various domains, yet their inherent opacity poses challenges, particularly in critical fields like healthcare, medicine, and the geosciences. Explainable AI (XAI) has emerged to shed light on these 'black box' models, aiding in deciphering their decision-making processes. However, different XAI methods often produce significantly different explanations, leading to high inter-method variability that increases uncertainty and undermines trust in deep networks' predictions. In this study, we address this challenge by introducing a novel framework designed to enhance the explainability of deep networks through a dual focus on maximizing both accuracy and comprehensibility in the explanations. Our framework integrates outputs from multiple established XAI methods and leverages a non-linear neural network model, termed the 'explanation optimizer,' to construct a unified, optimal explanation. The optimizer evaluates explanations using two key metrics: faithfulness (accuracy in reflecting the network's decisions) and complexity (comprehensibility). By balancing these, it provides accurate and accessible explanations, addressing a key XAI limitation. Experiments on multi-class and binary classification in 2D object and 3D neuroscience imaging confirm its efficacy. Our optimizer achieved faithfulness scores 155% and 63% higher than the best XAI methods in 3D and 2D tasks, respectively, while also reducing complexity for better understanding. These results demonstrate that optimal explanations based on specific quality criteria are achievable, offering a solution to the issue of inter-method variability in the current XAI literature and supporting more trustworthy deep network predictions

GEO-PHAug 19, 2022
Carefully choose the baseline: Lessons learned from applying XAI attribution methods for regression tasks in geoscience

Antonios Mamalakis, Elizabeth A. Barnes, Imme Ebert-Uphoff

Methods of eXplainable Artificial Intelligence (XAI) are used in geoscientific applications to gain insights into the decision-making strategy of Neural Networks (NNs) highlighting which features in the input contribute the most to a NN prediction. Here, we discuss our lesson learned that the task of attributing a prediction to the input does not have a single solution. Instead, the attribution results and their interpretation depend greatly on the considered baseline (sometimes referred to as reference point) that the XAI method utilizes; a fact that has been overlooked so far in the literature. This baseline can be chosen by the user or it is set by construction in the method s algorithm, often without the user being aware of that choice. We highlight that different baselines can lead to different insights for different science questions and, thus, should be chosen accordingly. To illustrate the impact of the baseline, we use a large ensemble of historical and future climate simulations forced with the SSP3-7.0 scenario and train a fully connected NN to predict the ensemble- and global-mean temperature (i.e., the forced global warming signal) given an annual temperature map from an individual ensemble member. We then use various XAI methods and different baselines to attribute the network predictions to the input. We show that attributions differ substantially when considering different baselines, as they correspond to answering different science questions. We conclude by discussing some important implications and considerations about the use of baselines in XAI research.

GEO-PHFeb 7, 2022
Investigating the fidelity of explainable artificial intelligence methods for applications of convolutional neural networks in geoscience

Antonios Mamalakis, Elizabeth A. Barnes, Imme Ebert-Uphoff

Convolutional neural networks (CNNs) have recently attracted great attention in geoscience due to their ability to capture non-linear system behavior and extract predictive spatiotemporal patterns. Given their black-box nature however, and the importance of prediction explainability, methods of explainable artificial intelligence (XAI) are gaining popularity as a means to explain the CNN decision-making strategy. Here, we establish an intercomparison of some of the most popular XAI methods and investigate their fidelity in explaining CNN decisions for geoscientific applications. Our goal is to raise awareness of the theoretical limitations of these methods and gain insight into the relative strengths and weaknesses to help guide best practices. The considered XAI methods are first applied to an idealized attribution benchmark, where the ground truth of explanation of the network is known a priori, to help objectively assess their performance. Secondly, we apply XAI to a climate-related prediction setting, namely to explain a CNN that is trained to predict the number of atmospheric rivers in daily snapshots of climate simulations. Our results highlight several important issues of XAI methods (e.g., gradient shattering, inability to distinguish the sign of attribution, ignorance to zero input) that have previously been overlooked in our field and, if not considered cautiously, may lead to a distorted picture of the CNN decision-making strategy. We envision that our analysis will motivate further investigation into XAI fidelity and will help towards a cautious implementation of XAI in geoscience, which can lead to further exploitation of CNNs and deep learning for prediction problems.

GEO-PHMar 18, 2021
Neural Network Attribution Methods for Problems in Geoscience: A Novel Synthetic Benchmark Dataset

Antonios Mamalakis, Imme Ebert-Uphoff, Elizabeth A. Barnes

Despite the increasingly successful application of neural networks to many problems in the geosciences, their complex and nonlinear structure makes the interpretation of their predictions difficult, which limits model trust and does not allow scientists to gain physical insights about the problem at hand. Many different methods have been introduced in the emerging field of eXplainable Artificial Intelligence (XAI), which aim at attributing the network s prediction to specific features in the input domain. XAI methods are usually assessed by using benchmark datasets (like MNIST or ImageNet for image classification). However, an objective, theoretically derived ground truth for the attribution is lacking for most of these datasets, making the assessment of XAI in many cases subjective. Also, benchmark datasets specifically designed for problems in geosciences are rare. Here, we provide a framework, based on the use of additively separable functions, to generate attribution benchmark datasets for regression problems for which the ground truth of the attribution is known a priori. We generate a large benchmark dataset and train a fully connected network to learn the underlying function that was used for simulation. We then compare estimated heatmaps from different XAI methods to the ground truth in order to identify examples where specific XAI methods perform well or poorly. We believe that attribution benchmarks as the ones introduced herein are of great importance for further application of neural networks in the geosciences, and for more objective assessment and accurate implementation of XAI methods, which will increase model trust and assist in discovering new science.