26.8LGMay 11
Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local LevelsOleksii Furman, Patryk Wielopolski, Łukasz Lenkiewicz et al.
The growing complexity of AI systems has intensified the need for transparency through Explainable AI (XAI). Counterfactual explanations (CFs) offer actionable "what-if" scenarios on three levels: Local CFs providing instance-specific insights, Global CFs addressing broader trends, and Group-wise CFs (GWCFs) striking a balance and revealing patterns within cohesive groups. Despite the availability of methods for each granularity level, the field lacks a unified method that integrates these complementary approaches. We address this limitation by proposing a gradient-based optimization method for differentiable models that generates Local, Global, and Group-wise Counterfactual Explanations in a unified manner. We especially enhance GWCF generation by combining instance grouping and counterfactual generation into a single efficient process, replacing traditional two-step methods. Moreover, to ensure trustworthiness, we innovatively introduce the integration of plausibility criteria into the GWCF domain, making explanations both valid and realistic. Our results demonstrate the method's effectiveness in balancing validity, proximity, and plausibility while optimizing group granularity, with practical utility validated through practical use cases.
41.8LGMay 11
V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy PredictionMarcin Kostrzewa, Sebastian Tomczak, Roman Furman et al.
Corporate bankruptcy prediction is a high-stakes financial task characterized by severe class imbalance and multi-horizon forecasting demands. Public datasets supporting it remain scarce and small: widely used free benchmarks contain between 6,000 and 80,000 company-year observations, while larger resources are behind subscription paywalls. To address this gap, we introduce V4FinBench, a benchmark of over one million company-year records from the Visegràd Group (V4) economies (2006-2021), with 131 financial and non-financial features, six prediction horizons, and a composite distress criterion jointly capturing solvency, profitability, and liquidity deterioration. V4FinBench is designed to support the evaluation of tabular and foundation-model methods under realistic class imbalance, with positive rates between 0.19% and 0.36%. We provide reference evaluations of standard tabular baselines, finetuned TabPFN, and QLoRA-finetuned Llama-3-8B. With imbalance-aware finetuning, TabPFN matches or exceeds gradient boosting at longer time horizons on both $F_1$-score and ROC-AUC. In contrast, Llama-3-8B trails gradient boosting on ROC-AUC at every horizon and is generally weaker on $F_1$-score, with the gap widening sharply beyond the immediate horizon. In an external evaluation on the American Bankruptcy Dataset, the V4FinBench-finetuned TabPFN checkpoint improves over vanilla TabPFN, suggesting that adaptation captures transferable financial-distress structure rather than only V4-specific patterns. V4FinBench is publicly released to support further evaluation and development of prediction methods on realistic financial data.
LGMar 16, 2025Code
HyConEx: Hypernetwork classifier with counterfactual explanationsPatryk Marszałek, Ulvi Movsum-zada, Oleksii Furman et al.
In recent years, there has been a growing interest in explainable AI methods. We want not only to make accurate predictions using sophisticated neural networks but also to understand what the model's decision is based on. One of the fundamental levels of interpretability is to provide counterfactual examples explaining the rationale behind the decision and identifying which features, and to what extent, must be modified to alter the model's outcome. To address these requirements, we introduce HyConEx, a classification model based on deep hypernetworks specifically designed for tabular data. Owing to its unique architecture, HyConEx not only provides class predictions but also delivers local interpretations for individual data samples in the form of counterfactual examples that steer a given sample toward an alternative class. While many explainable methods generated counterfactuals for external models, there have been no interpretable classifiers simultaneously producing counterfactual samples so far. HyConEx achieves competitive performance on several metrics assessing classification accuracy and fulfilling the criteria of a proper counterfactual attack. This makes HyConEx a distinctive deep learning model, which combines predictions and explainers as an all-in-one neural network. The code is available at https://github.com/gmum/HyConEx.
LGFeb 19
CounterFlowNet: From Minimal Changes to Meaningful Counterfactual ExplanationsOleksii Furman, Patryk Marszałek, Jan Masłowski et al.
Counterfactual explanations (CFs) provide human-interpretable insights into model's predictions by identifying minimal changes to input features that would alter the model's output. However, existing methods struggle to generate multiple high-quality explanations that (1) affect only a small portion of the features, (2) can be applied to tabular data with heterogeneous features, and (3) are consistent with the user-defined constraints. We propose CounterFlowNet, a generative approach that formulates CF generation as sequential feature modification using conditional Generative Flow Networks (GFlowNet). CounterFlowNet is trained to sample CFs proportionally to a user-specified reward function that can encode key CF desiderata: validity, sparsity, proximity and plausibility, encouraging high-quality explanations. The sequential formulation yields highly sparse edits, while a unified action space seamlessly supports continuous and categorical features. Moreover, actionability constraints, such as immutability and monotonicity of features, can be enforced at inference time via action masking, without retraining. Experiments on eight datasets under two evaluation protocols demonstrate that CounterFlowNet achieves superior trade-offs between validity, sparsity, plausibility, and diversity with full satisfaction of the given constraints.
LGNov 20, 2025
Are Foundation Models Useful for Bankruptcy Prediction?Marcin Kostrzewa, Oleksii Furman, Roman Furman et al.
Foundation models have shown promise across various financial applications, yet their effectiveness for corporate bankruptcy prediction remains systematically unevaluated against established methods. We study bankruptcy forecasting using Llama-3.3-70B-Instruct and TabPFN, evaluated on large, highly imbalanced datasets of over one million company records from the Visegrád Group. We provide the first systematic comparison of foundation models against classical machine learning baselines for this task. Our results show that models such as XGBoost and CatBoost consistently outperform foundation models across all prediction horizons. LLM-based approaches suffer from unreliable probability estimates, undermining their use in risk-sensitive financial settings. TabPFN, while competitive with simpler baselines, requires substantial computational resources with costs not justified by performance gains. These findings suggest that, despite their generality, current foundation models remain less effective than specialized methods for bankruptcy forecasting.
LGOct 7, 2025
OBSR: Open Benchmark for Spatial RepresentationsJulia Moska, Oleksii Furman, Kacper Kozaczko et al.
GeoAI is evolving rapidly, fueled by diverse geospatial datasets like traffic patterns, environmental data, and crowdsourced OpenStreetMap (OSM) information. While sophisticated AI models are being developed, existing benchmarks are often concentrated on single tasks and restricted to a single modality. As such, progress in GeoAI is limited by the lack of a standardized, multi-task, modality-agnostic benchmark for their systematic evaluation. This paper introduces a novel benchmark designed to assess the performance, accuracy, and efficiency of geospatial embedders. Our benchmark is modality-agnostic and comprises 7 distinct datasets from diverse cities across three continents, ensuring generalizability and mitigating demographic biases. It allows for the evaluation of GeoAI embedders on various phenomena that exhibit underlying geographic processes. Furthermore, we establish a simple and intuitive task-oriented model baselines, providing a crucial reference point for comparing more complex solutions.
LGMay 29, 2025
DiCoFlex: Model-agnostic diverse counterfactuals with flexible controlOleksii Furman, Ulvi Movsum-zada, Patryk Marszalek et al.
Counterfactual explanations play a pivotal role in explainable artificial intelligence (XAI) by offering intuitive, human-understandable alternatives that elucidate machine learning model decisions. Despite their significance, existing methods for generating counterfactuals often require constant access to the predictive model, involve computationally intensive optimization for each instance and lack the flexibility to adapt to new user-defined constraints without retraining. In this paper, we propose DiCoFlex, a novel model-agnostic, conditional generative framework that produces multiple diverse counterfactuals in a single forward pass. Leveraging conditional normalizing flows trained solely on labeled data, DiCoFlex addresses key limitations by enabling real-time user-driven customization of constraints such as sparsity and actionability at inference time. Extensive experiments on standard benchmark datasets show that DiCoFlex outperforms existing methods in terms of validity, diversity, proximity, and constraint adherence, making it a practical and scalable solution for counterfactual generation in sensitive decision-making domains.