AISep 21, 2023
A Comprehensive Review on Financial Explainable AIWei Jie Yeo, Wihan van der Heever, Rui Mao et al.
The success of artificial intelligence (AI), and deep learning models in particular, has led to their widespread adoption across various industries due to their ability to process huge amounts of data and learn complex patterns. However, due to their lack of explainability, there are significant concerns regarding their use in critical sectors, such as finance and healthcare, where decision-making transparency is of paramount importance. In this paper, we provide a comparative survey of methods that aim to improve the explainability of deep learning models within the context of finance. We categorize the collection of explainable AI methods according to their corresponding characteristics, and we review the concerns and challenges of adopting explainable AI methods, together with future directions we deemed appropriate and important.
NAMay 18, 2018
A comparative study on polynomial dealiasing and split form discontinuous Galerkin schemes for under-resolved turbulence computationsAndrew R. Winters, Rodrigo C. Moura, Gianmarco Mengaldo et al.
This work focuses on the accuracy and stability of high-order nodal discontinuous Galerkin (DG) methods for under-resolved turbulence computations. In particular we consider the inviscid Taylor-Green vortex (TGV) flow to analyse the implicit large eddy simulation (iLES) capabilities of DG methods at very high Reynolds numbers. The governing equations are discretised in two ways in order to suppress aliasing errors introduced into the discrete variational forms due to the under-integration of non-linear terms. The first, more straightforward way relies on consistent/over-integration, where quadrature accuracy is improved by using a larger number of integration points, consistent with the degree of the non-linearities. The second strategy, originally applied in the high-order finite difference community, relies on a split (or skew-symmetric) form of the governing equations. Different split forms are available depending on how the variables in the non-linear terms are grouped. The desired split form is then built by averaging conservative and non-conservative forms of the governing equations, although conservativity of the DG scheme is fully preserved. A preliminary analysis based on Burgers' turbulence in one spatial dimension is conducted and shows the potential of split forms in keeping the energy of higher-order polynomial modes close to the expected levels. This indicates that the favourable dealiasing properties observed from split-form approaches in more classical schemes seem to hold for DG. The remainder of the study considers a comprehensive set of (under-resolved) computations of the inviscid TGV flow and compares the accuracy and robustness of consistent/over-integration and split form discretisations based on the local Lax-Friedrichs and Roe-type Riemann solvers...
COMP-PHApr 25, 2018
Non-modal analysis of spectral element methods: Towards accurate and robust large-eddy simulationsPablo Fernandez, Rodrigo Moura, Gianmarco Mengaldo et al.
We introduce a \textit{non-modal} analysis technique that characterizes the diffusion properties of spectral element methods for linear convection-diffusion systems. While strictly speaking only valid for linear problems, the analysis is devised so that it can give critical insights on two questions: (i) Why do spectral element methods suffer from stability issues in under-resolved computations of nonlinear problems? And, (ii) why do they successfully predict under-resolved turbulent flows even without a subgrid-scale model? The answer to these two questions can in turn provide crucial guidelines to construct more robust and accurate schemes for complex under-resolved flows, commonly found in industrial applications. For illustration purposes, this analysis technique is applied to the hybridized discontinuous Galerkin methods as representatives of spectral element methods. The effect of the polynomial order, the upwinding parameter and the Péclet number on the so-called \textit{short-term diffusion} of the scheme are investigated. From a purely non-modal analysis point of view, polynomial orders between $2$ and $4$ with standard upwinding are well suited for under-resolved turbulence simulations. For lower polynomial orders, diffusion is introduced in scales that are much larger than the grid resolution. For higher polynomial orders, as well as for strong under/over-upwinding, robustness issues can be expected. The non-modal analysis results are then tested against under-resolved turbulence simulations of the Burgers, Euler and Navier-Stokes equations. While devised in the linear setting, our non-modal analysis succeeds to predict the behavior of the scheme in the nonlinear problems considered.
CLMar 5, 2023
FinXABSA: Explainable Finance through Aspect-Based Sentiment AnalysisKeane Ong, Wihan van der Heever, Ranjan Satapathy et al.
This paper presents a novel approach for explainability in financial analysis by deriving financially-explainable statistical relationships through aspect-based sentiment analysis, Pearson correlation, Granger causality & uncertainty coefficient. The proposed methodology involves constructing an aspect list from financial literature and applying aspect-based sentiment analysis on social media text to compute sentiment scores for each aspect. Pearson correlation is then applied to uncover financially explainable relationships between aspect sentiment scores and stock prices. Findings for derived relationships are made robust by applying Granger causality to determine the forecasting ability of each aspect sentiment score for stock prices. Finally, an added layer of interpretability is added by evaluating uncertainty coefficient scores between aspect sentiment scores and stock prices. This allows us to determine the aspects whose sentiment scores are most statistically significant for stock prices. Relative to other methods, our approach provides a more informative and accurate understanding of the relationship between sentiment analysis and stock prices. Specifically, this methodology enables an interpretation of the statistical relationship between aspect-based sentiment scores and stock prices, which offers explainability to AI-driven financial decision-making.
SOC-PHNov 26, 2025
AI4X Roadmap: Artificial Intelligence for the advancement of scientific pursuit and its future directionsStephen G. Dale, Nikita Kazeev, Alastair J. A. Price et al.
Artificial intelligence and machine learning are reshaping how we approach scientific discovery, not by replacing established methods but by extending what researchers can probe, predict, and design. In this roadmap we provide a forward-looking view of AI-enabled science across biology, chemistry, climate science, mathematics, materials science, physics, self-driving laboratories and unconventional computing. Several shared themes emerge: the need for diverse and trustworthy data, transferable electronic-structure and interatomic models, AI systems integrated into end-to-end scientific workflows that connect simulations to experiments and generative systems grounded in synthesisability rather than purely idealised phases. Across domains, we highlight how large foundation models, active learning and self-driving laboratories can close loops between prediction and validation while maintaining reproducibility and physical interpretability. Taken together, these perspectives outline where AI-enabled science stands today, identify bottlenecks in data, methods and infrastructure, and chart concrete directions for building AI systems that are not only more powerful but also more transparent and capable of accelerating discovery in complex real-world environments.
CYJul 3, 2024
Explainable Natural Language Processing for Corporate Sustainability AnalysisKeane Ong, Rui Mao, Ranjan Satapathy et al.
Sustainability commonly refers to entities, such as individuals, companies, and institutions, having a non-detrimental (or even positive) impact on the environment, society, and the economy. With sustainability becoming a synonym of acceptable and legitimate behaviour, it is being increasingly demanded and regulated. Several frameworks and standards have been proposed to measure the sustainability impact of corporations, including United Nations' sustainable development goals and the recently introduced global sustainability reporting framework, amongst others. However, the concept of corporate sustainability is complex due to the diverse and intricate nature of firm operations (i.e. geography, size, business activities, interlinks with other stakeholders). As a result, corporate sustainability assessments are plagued by subjectivity both within data that reflect corporate sustainability efforts (i.e. corporate sustainability disclosures) and the analysts evaluating them. This subjectivity can be distilled into distinct challenges, such as incompleteness, ambiguity, unreliability and sophistication on the data dimension, as well as limited resources and potential bias on the analyst dimension. Put together, subjectivity hinders effective cost attribution to entities non-compliant with prevailing sustainability expectations, potentially rendering sustainability efforts and its associated regulations futile. To this end, we argue that Explainable Natural Language Processing (XNLP) can significantly enhance corporate sustainability analysis. Specifically, linguistic understanding algorithms (lexical, semantic, syntactic), integrated with XAI capabilities (interpretability, explainability, faithfulness), can bridge gaps in analyst resources and mitigate subjectivity problems within data.
LGJul 29, 2024
Revisiting the robustness of post-hoc interpretability methodsJiawen Wei, Hugues Turbé, Gianmarco Mengaldo
Post-hoc interpretability methods play a critical role in explainable artificial intelligence (XAI), as they pinpoint portions of data that a trained deep learning model deemed important to make a decision. However, different post-hoc interpretability methods often provide different results, casting doubts on their accuracy. For this reason, several evaluation strategies have been proposed to understand the accuracy of post-hoc interpretability. Many of these evaluation strategies provide a coarse-grained assessment -- i.e., they evaluate how the performance of the model degrades on average by corrupting different data points across multiple samples. While these strategies are effective in selecting the post-hoc interpretability method that is most reliable on average, they fail to provide a sample-level, also referred to as fine-grained, assessment. In other words, they do not measure the robustness of post-hoc interpretability methods. We propose an approach and two new metrics to provide a fine-grained assessment of post-hoc interpretability methods. We show that the robustness is generally linked to its coarse-grained performance.
91.0CYMar 21
The data heat island effect: quantifying the impact of AI data centers in a warming worldAndrea Marinoni, Pietro Lio', Erik Cambria et al.
The strong and continuous increase of AI-based services leads to the steady proliferation of AI data centres worldwide with the unavoidable escalation of their power consumption. It is unknown how this energy demand for computational purposes will impact the surrounding environment. Here, we focus our attention on the heat dissipation of AI hyperscalers. Taking advantage of land surface temperature measurements acquired by remote sensing platforms over the last decades, we are able to obtain a robust assessment of the temperature increase recorded in the areas surrounding AI data centres globally. We estimate that the land surface temperature increases by 2°C on average after the start of operations of an AI data centre, inducing local microclimate zones, which we call the data heat island effect. We assess the impact on the communities, quantifying that more than 340 million people could be affected by this temperature increase. Our results show that the data heat island effect could have a remarkable influence on communities and regional welfare in the future, hence becoming part of the conversation around environmentally sustainable AI worldwide.
AIFeb 11
OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy OptimizationKeane Ong, Sabri Boughorbel, Luwei Xiao et al.
To develop socially intelligent AI, existing approaches typically model human behavioral dimensions (e.g., affective, cognitive, or social attributes) in isolation. Although useful, task-specific modeling often increases training costs and limits generalization across behavioral settings. Recent reasoning RL methods facilitate training a single unified model across multiple behavioral tasks, but do not explicitly address learning across different heterogeneous behavioral data. To address this gap, we introduce Heterogeneity-Aware Relative Policy Optimization (HARPO), an RL method that balances leaning across heterogeneous tasks and samples. This is achieved by modulating advantages to ensure that no single task or sample carries disproportionate influence during policy optimization. Using HARPO, we develop and release Omnisapiens-7B 2.0, a foundation model for social behavior processing. Relative to existing behavioral foundation models, Omnisapiens-7B 2.0 achieves the strongest performance across behavioral tasks, with gains of up to +16.85% and +9.37% on multitask and held-out settings respectively, while producing more explicit and robust reasoning traces. We also validate HARPO against recent RL methods, where it achieves the most consistently strong performance across behavioral tasks.
CLJan 29
Enhancing Language Models for Robust Greenwashing DetectionNeil Heinrich Braun, Keane Ong, Rui Mao et al.
Sustainability reports are critical for ESG assessment, yet greenwashing and vague claims often undermine their reliability. Existing NLP models lack robustness to these practices, typically relying on surface-level patterns that generalize poorly. We propose a parameter-efficient framework that structures LLM latent spaces by combining contrastive learning with an ordinal ranking objective to capture graded distinctions between concrete actions and ambiguous claims. Our approach incorporates gated feature modulation to filter disclosure noise and utilizes MetaGradNorm to stabilize multi-objective optimization. Experiments in cross-category settings demonstrate superior robustness over standard baselines while revealing a trade-off between representational rigidity and generalization.
CVFeb 26, 2025Code
Tell me why: Visual foundation models as self-explainable classifiersHugues Turbé, Mina Bjelogrlic, Gianmarco Mengaldo et al.
Visual foundation models (VFMs) have become increasingly popular due to their state-of-the-art performance. However, interpretability remains crucial for critical applications. In this sense, self-explainable models (SEM) aim to provide interpretable classifiers that decompose predictions into a weighted sum of interpretable concepts. Despite their promise, recent studies have shown that these explanations often lack faithfulness. In this work, we combine VFMs with a novel prototypical architecture and specialized training objectives. By training only a lightweight head (approximately 1M parameters) on top of frozen VFMs, our approach (ProtoFM) offers an efficient and interpretable solution. Evaluations demonstrate that our approach achieves competitive classification performance while outperforming existing models across a range of interpretability metrics derived from the literature. Code is available at https://github.com/hturbe/proto-fm.
45.3CDMar 14
Hierarchy of extreme-event predictability in turbulence revealed by machine learningYuxuan Yang, Chenyu Dong, Gianmarco Mengaldo
Extreme-event predictability in turbulence is strongly state dependent, yet event-by-event predictability horizons are difficult to quantify without access to governing equations or costly perturbation ensembles. Here we train an autoregressive conditional diffusion model on direct numerical simulations of the two-dimensional Kolmogorov flow and use a CRPS-based skill score to define an event-wise predictability horizon. Enstrophy extremes exhibit a pronounced hierarchy: forecast skill persists from $\approx 1$ to $> 4$ Lyapunov times across events. Spectral filtering shows that these horizons are controlled predominantly by large-scale structures. Extremes are preceded by intense strain cores organizing quadrupolar vortex packets, whose lifetime sharply separates long- from short-horizon events. These results identify coherent-structure persistence as a governing mechanism for the predictability of turbulence extremes and provide a data-driven route to diagnose predictability limits from observations.
CLMay 26, 2025Code
Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual GenerationKeane Ong, Rui Mao, Deeksha Varshney et al.
Counterfactual reasoning typically involves considering alternatives to actual events. While often applied to understand past events, a distinct form-forward counterfactual reasoning-focuses on anticipating plausible future developments. This type of reasoning is invaluable in dynamic financial markets, where anticipating market developments can powerfully unveil potential risks and opportunities for stakeholders, guiding their decision-making. However, performing this at scale is challenging due to the cognitive demands involved, underscoring the need for automated solutions. LLMs offer promise, but remain unexplored for this application. To address this gap, we introduce a novel benchmark, FIN-FORCE-FINancial FORward Counterfactual Evaluation. By curating financial news headlines and providing structured evaluation, FIN-FORCE supports LLM based forward counterfactual generation. This paves the way for scalable and automated solutions for exploring and anticipating future market developments, thereby providing structured insights for decision-making. Through experiments on FIN-FORCE, we evaluate state-of-the-art LLMs and counterfactual generation methods, analyzing their limitations and proposing insights for future research. We release the benchmark, supplementary data and all experimental codes at the following link: https://github.com/keanepotato/fin_force
CVJun 14, 2024Code
ProtoS-ViT: Visual foundation models for sparse self-explainable classificationsHugues Turbé, Mina Bjelogrlic, Gianmarco Mengaldo et al.
Prototypical networks aim to build intrinsically explainable models based on the linear summation of concepts. Concepts are coherent entities that we, as humans, can recognize and associate with a certain object or entity. However, important challenges remain in the fair evaluation of explanation quality provided by these models. This work first proposes an extensive set of quantitative and qualitative metrics which allow to identify drawbacks in current prototypical networks. It then introduces a novel architecture which provides compact explanations, outperforming current prototypical models in terms of explanation quality. Overall, the proposed architecture demonstrates how frozen pre-trained ViT backbones can be effectively turned into prototypical models for both general and domain-specific tasks, in our case biomedical image classifiers. Code is available at \url{https://github.com/hturbe/protosvit}.
AIJan 9
Explainable AI: Learning from the LearnersRicardo Vinuesa, Steven L. Brunton, Gianmarco Mengaldo
Artificial intelligence now outperforms humans in several scientific and engineering tasks, yet its internal representations often remain opaque. In this Perspective, we argue that explainable artificial intelligence (XAI), combined with causal reasoning, enables {\it learning from the learners}. Focusing on discovery, optimization and certification, we show how the combination of foundation models and explainability methods allows the extraction of causal mechanisms, guides robust design and control, and supports trust and accountability in high-stakes applications. We discuss challenges in faithfulness, generalization and usability of explanations, and propose XAI as a unifying framework for human-AI collaboration in science and engineering.
LGNov 13, 2025
eXIAA: eXplainable Injections for Adversarial AttackLeonardo Pesce, Jiawen Wei, Gianmarco Mengaldo
Post-hoc explainability methods are a subset of Machine Learning (ML) that aim to provide a reason for why a model behaves in a certain way. In this paper, we show a new black-box model-agnostic adversarial attack for post-hoc explainable Artificial Intelligence (XAI), particularly in the image domain. The goal of the attack is to modify the original explanations while being undetected by the human eye and maintain the same predicted class. In contrast to previous methods, we do not require any access to the model or its weights, but only to the model's computed predictions and explanations. Additionally, the attack is accomplished in a single step while significantly changing the provided explanations, as demonstrated by empirical evaluation. The low requirements of our method expose a critical vulnerability in current explainability methods, raising concerns about their reliability in safety-critical applications. We systematically generate attacks based on the explanations generated by post-hoc explainability methods (saliency maps, integrated gradients, and DeepLIFT SHAP) for pretrained ResNet-18 and ViT-B16 on ImageNet. The results show that our attacks could lead to dramatically different explanations without changing the predictive probabilities. We validate the effectiveness of our attack, compute the induced change based on the explanation with mean absolute difference, and verify the closeness of the original image and the corrupted one with the Structural Similarity Index Measure (SSIM).
CLFeb 20, 2025
Towards Robust ESG Analysis Against Greenwashing Risks: Aspect-Action Analysis with Cross-Category GeneralizationKeane Ong, Rui Mao, Deeksha Varshney et al.
Sustainability reports are key for evaluating companies' environmental, social and governance, ESG performance, but their content is increasingly obscured by greenwashing - sustainability claims that are misleading, exaggerated, and fabricated. Yet, existing NLP approaches for ESG analysis lack robustness against greenwashing risks, often extracting insights that reflect misleading or exaggerated sustainability claims rather than objective ESG performance. To bridge this gap, we introduce A3CG - Aspect-Action Analysis with Cross-Category Generalization, as a novel dataset to improve the robustness of ESG analysis amid the prevalence of greenwashing. By explicitly linking sustainability aspects with their associated actions, A3CG facilitates a more fine-grained and transparent evaluation of sustainability claims, ensuring that insights are grounded in verifiable actions rather than vague or misleading rhetoric. Additionally, A3CG emphasizes cross-category generalization. This ensures robust model performance in aspect-action analysis even when companies change their reports to selectively favor certain sustainability areas. Through experiments on A3CG, we analyze state-of-the-art supervised models and LLMs, uncovering their limitations and outlining key directions for future research.
CLJan 27, 2025
ESGSenticNet: A Neurosymbolic Knowledge Base for Corporate Sustainability AnalysisKeane Ong, Rui Mao, Deeksha Varshney et al.
Evaluating corporate sustainability performance is essential to drive sustainable business practices, amid the need for a more sustainable economy. However, this is hindered by the complexity and volume of corporate sustainability data (i.e. sustainability disclosures), not least by the effectiveness of the NLP tools used to analyse them. To this end, we identify three primary challenges - immateriality, complexity, and subjectivity, that exacerbate the difficulty of extracting insights from sustainability disclosures. To address these issues, we introduce ESGSenticNet, a publicly available knowledge base for sustainability analysis. ESGSenticNet is constructed from a neurosymbolic framework that integrates specialised concept parsing, GPT-4o inference, and semi-supervised label propagation, together with a hierarchical taxonomy. This approach culminates in a structured knowledge base of 44k knowledge triplets - ('halve carbon emission', supports, 'emissions control'), for effective sustainability analysis. Experiments indicate that ESGSenticNet, when deployed as a lexical method, more effectively captures relevant and actionable sustainability information from sustainability disclosures compared to state of the art baselines. Besides capturing a high number of unique ESG topic terms, ESGSenticNet outperforms baselines on the ESG relatedness and ESG action orientation of these terms by 26% and 31% respectively. These metrics describe the extent to which topic terms are related to ESG, and depict an action toward ESG. Moreover, when deployed as a lexical method, ESGSenticNet does not require any training, possessing a key advantage in its simplicity for non-technical stakeholders.
LGApr 15, 2025
Dynamical errors in machine learning forecastsZhou Fang, Gianmarco Mengaldo
In machine learning forecasting, standard error metrics such as mean absolute error (MAE) and mean squared error (MSE) quantify discrepancies between predictions and target values. However, these metrics do not directly evaluate the physical and/or dynamical consistency of forecasts, an increasingly critical concern in scientific and engineering applications. Indeed, a fundamental yet often overlooked question is whether machine learning forecasts preserve the dynamical behavior of the underlying system. Addressing this issue is essential for assessing the fidelity of machine learning models and identifying potential failure modes, particularly in applications where maintaining correct dynamical behavior is crucial. In this work, we investigate the relationship between standard forecasting error metrics, such as MAE and MSE, and the dynamical properties of the underlying system. To achieve this goal, we use two recently developed dynamical indices: the instantaneous dimension ($d$), and the inverse persistence ($θ$). Our results indicate that larger forecast errors -- e.g., higher MSE -- tend to occur in states with higher $d$ (higher complexity) and higher $θ$ (lower persistence). To further assess dynamical consistency, we propose error metrics based on the dynamical indices that measure the discrepancy of the forecasted $d$ and $θ$ versus their correct values. Leveraging these dynamical indices-based metrics, we analyze direct and recursive forecasting strategies for three canonical datasets -- Lorenz, Kuramoto-Sivashinsky equation, and Kolmogorov flow -- as well as a real-world weather forecasting task. Our findings reveal substantial distortions in dynamical properties in ML forecasts, especially for long forecast lead times or long recursive simulations, providing complementary information on ML forecast fidelity that can be used to improve ML models.
AIOct 6, 2025
Human Behavior Atlas: Benchmarking Unified Psychological and Social Behavior UnderstandingKeane Ong, Wei Dai, Carol Li et al.
Using intelligent systems to perceive psychological and social behaviors, that is, the underlying affective, cognitive, and pathological states that are manifested through observable behaviors and social interactions, remains a challenge due to their complex, multifaceted, and personalized nature. Existing work tackling these dimensions through specialized datasets and single-task systems often miss opportunities for scalability, cross-task transfer, and broader generalization. To address this gap, we curate Human Behavior Atlas, a unified benchmark of diverse behavioral tasks designed to support the development of unified models for understanding psychological and social behaviors. Human Behavior Atlas comprises over 100,000 samples spanning text, audio, and visual modalities, covering tasks on affective states, cognitive states, pathologies, and social processes. Our unification efforts can reduce redundancy and cost, enable training to scale efficiently across tasks, and enhance generalization of behavioral features across domains. On Human Behavior Atlas, we train three models: OmniSapiens-7B SFT, OmniSapiens-7B BAM, and OmniSapiens-7B RL. We show that training on Human Behavior Atlas enables models to consistently outperform existing multimodal LLMs across diverse behavioral tasks. Pretraining on Human Behavior Atlas also improves transfer to novel behavioral datasets; with the targeted use of behavioral descriptors yielding meaningful performance gains.
CLApr 27, 2025
ClimaEmpact: Domain-Aligned Small Language Models and Datasets for Extreme Weather AnalyticsDeeksha Varshney, Keane Ong, Rui Mao et al.
Accurate assessments of extreme weather events are vital for research and policy, yet localized and granular data remain scarce in many parts of the world. This data gap limits our ability to analyze potential outcomes and implications of extreme weather events, hindering effective decision-making. Large Language Models (LLMs) can process vast amounts of unstructured text data, extract meaningful insights, and generate detailed assessments by synthesizing information from multiple sources. Furthermore, LLMs can seamlessly transfer their general language understanding to smaller models, enabling these models to retain key knowledge while being fine-tuned for specific tasks. In this paper, we propose Extreme Weather Reasoning-Aware Alignment (EWRA), a method that enhances small language models (SLMs) by incorporating structured reasoning paths derived from LLMs, and ExtremeWeatherNews, a large dataset of extreme weather event-related news articles. EWRA and ExtremeWeatherNews together form the overall framework, ClimaEmpact, that focuses on addressing three critical extreme-weather tasks: categorization of tangible vulnerabilities/impacts, topic labeling, and emotion analysis. By aligning SLMs with advanced reasoning strategies on ExtremeWeatherNews (and its derived dataset ExtremeAlign used specifically for SLM alignment), EWRA improves the SLMs' ability to generate well-grounded and domain-specific responses for extreme weather analytics. Our results show that the approach proposed guides SLMs to output domain-aligned responses, surpassing the performance of task-specific models and offering enhanced real-world applicability for extreme weather analytics.
AO-PHFeb 18, 2025
CondensNet: Enabling stable long-term climate simulations via hybrid deep learning models with adaptive physical constraintsXin Wang, Juntao Yang, Jeff Adie et al. · nvidia
Accurate and efficient climate simulations are crucial for understanding Earth's evolving climate. However, current general circulation models (GCMs) face challenges in capturing unresolved physical processes, such as cloud and convection. A common solution is to adopt cloud resolving models, that provide more accurate results than the standard subgrid parametrisation schemes typically used in GCMs. However, cloud resolving models, also referred to as super paramtetrizations, remain computationally prohibitive. Hybrid modeling, which integrates deep learning with equation-based GCMs, offers a promising alternative but often struggles with long-term stability and accuracy issues. In this work, we find that water vapor oversaturation during condensation is a key factor compromising the stability of hybrid models. To address this, we introduce CondensNet, a novel neural network architecture that embeds a self-adaptive physical constraint to correct unphysical condensation processes. CondensNet effectively mitigates water vapor oversaturation, enhancing simulation stability while maintaining accuracy and improving computational efficiency compared to super parameterization schemes. We integrate CondensNet into a GCM to form PCNN-GCM (Physics-Constrained Neural Network GCM), a hybrid deep learning framework designed for long-term stable climate simulations in real-world conditions, including ocean and land. PCNN-GCM represents a significant milestone in hybrid climate modeling, as it shows a novel way to incorporate physical constraints adaptively, paving the way for accurate, lightweight, and stable long-term climate simulations.
CYSep 22, 2025
Explainability matters: The effect of liability rules on the healthcare sectorJiawen Wei, Elena Verona, Andrea Bertolini et al.
Explainability, the capability of an artificial intelligence system (AIS) to explain its outcomes in a manner that is comprehensible to human beings at an acceptable level, has been deemed essential for critical sectors, such as healthcare. Is it really the case? In this perspective, we consider two extreme cases, ``Oracle'' (without explainability) versus ``AI Colleague'' (with explainability) for a thorough analysis. We discuss how the level of automation and explainability of AIS can affect the determination of liability among the medical practitioner/facility and manufacturer of AIS. We argue that explainability plays a crucial role in setting a responsibility framework in healthcare, from a legal standpoint, to shape the behavior of all involved parties and mitigate the risk of potential defensive medicine practices.
LGMar 11, 2025
XAI4Extremes: An interpretable machine learning framework for understanding extreme-weather precursors under climate changeJiawen Wei, Aniruddha Bora, Vivek Oommen et al.
Extreme weather events are increasing in frequency and intensity due to climate change. This, in turn, is exacting a significant toll in communities worldwide. While prediction skills are increasing with advances in numerical weather prediction and artificial intelligence tools, extreme weather still present challenges. More specifically, identifying the precursors of such extreme weather events and how these precursors may evolve under climate change remain unclear. In this paper, we propose to use post-hoc interpretability methods to construct relevance weather maps that show the key extreme-weather precursors identified by deep learning models. We then compare this machine view with existing domain knowledge to understand whether deep learning models identified patterns in data that may enrich our understanding of extreme-weather precursors. We finally bin these relevant maps into different multi-year time periods to understand the role that climate change is having on these precursors. The experiments are carried out on Indochina heatwaves, but the methodology can be readily extended to other extreme weather events worldwide.
AIJun 15, 2024
Explain the Black Box for the Sake of Science: the Scientific Method in the Era of Generative Artificial IntelligenceGianmarco Mengaldo
The scientific method is the cornerstone of human progress across all branches of the natural and applied sciences, from understanding the human body to explaining how the universe works. The scientific method is based on identifying systematic rules or principles that describe the phenomenon of interest in a reproducible way that can be validated through experimental evidence. In the era of generative artificial intelligence, there are discussions on how AI systems may discover new knowledge. We argue that human complex reasoning for scientific discovery remains of vital importance, at least before the advent of artificial general intelligence. Yet, AI can be leveraged for scientific discovery via explainable AI. More specifically, knowing the `principles' the AI systems used to make decisions can be a point of contact with domain experts and scientists, that can lead to divergent or convergent views on a given scientific problem. Divergent views may spark further scientific investigations leading to interpretability-guided explanations (IGEs), and possibly to new scientific knowledge. We define this field as Explainable AI for Science, where domain experts -- potentially assisted by generative AI -- formulate scientific hypotheses and explanations based on the interpretability of a predictive AI system.
LGFeb 11, 2022
Evaluation of post-hoc interpretability methods in time-series classificationHugues Turbé, Mina Bjelogrlic, Christian Lovis et al.
Post-hoc interpretability methods are critical tools to explain neural-network results. Several post-hoc methods have emerged in recent years, but when applied to a given task, they produce different results, raising the question of which method is the most suitable to provide correct post-hoc interpretability. To understand the performance of each method, quantitative evaluation of interpretability methods is essential. However, currently available frameworks have several drawbacks which hinders the adoption of post-hoc interpretability methods, especially in high-risk sectors. In this work, we propose a framework with quantitative metrics to assess the performance of existing post-hoc interpretability methods in particular in time series classification. We show that several drawbacks identified in the literature are addressed, namely dependence on human judgement, retraining, and shift in the data distribution when occluding samples. We additionally design a synthetic dataset with known discriminative features and tunable complexity. The proposed methodology and quantitative metrics can be used to understand the reliability of interpretability methods results obtained in practical applications. In turn, they can be embedded within operational workflows in critical fields that require accurate interpretability results for e.g., regulatory policies.