DBNov 24, 2023
Counting Solutions to Conjunctive Queries: Structural and Hybrid TractabilityHubie Chen, Gianluigi Greco, Stefan Mengel et al.
Counting the number of answers to conjunctive queries is a fundamental problem in databases that, under standard assumptions, does not have an efficient solution. The issue is inherently #P-hard, extending even to classes of acyclic instances. To address this, we pinpoint tractable classes by examining the structural properties of instances and introducing the novel concept of #-hypertree decomposition. We establish the feasibility of counting answers in polynomial time for classes of queries featuring bounded #-hypertree width. Additionally, employing novel techniques from the realm of fixed-parameter computational complexity, we prove that, for bounded arity queries, the bounded #-hypertree width property precisely delineates the frontier of tractability for the counting problem. This result closes an important gap in our understanding of the complexity of such a basic problem for conjunctive queries and, equivalently, for constraint satisfaction problems (CSPs). Drawing upon #-hypertree decompositions, a ''hybrid'' decomposition method emerges. This approach leverages both the structural characteristics of the query and properties intrinsic to the input database, including keys or other (weaker) degree constraints that limit the permissible combinations of values. Intuitively, these features may introduce distinct structural properties that elude identification through the ''worst-possible database'' perspective inherent in purely structural methods.
GNMar 29, 2023
A New Deep Learning and XAI-Based Algorithm for Features Selection in GenomicsCarlo Adornetto, Gianluigi Greco
In the field of functional genomics, the analysis of gene expression profiles through Machine and Deep Learning is increasingly providing meaningful insight into a number of diseases. The paper proposes a novel algorithm to perform Feature Selection on genomic-scale data, which exploits the reconstruction capabilities of autoencoders and an ad-hoc defined Explainable Artificial Intelligence-based score in order to select the most informative genes for diagnosis, prognosis, and precision medicine. Results of the application on a Chronic Lymphocytic Leukemia dataset evidence the effectiveness of the algorithm, by identifying and suggesting a set of meaningful genes for further medical investigation.
SIMar 8
The Theory and Practice of Computing the Bus-FactorSebastiano A. Piccolo, Pasquale De Meo, Giorgio Terracina et al.
The bus-factor is a measure of project risk with respect to personnel availability, informally defined as the number of people whose sudden unavailability would cause a project to stall or experience severe delays. Despite its intuitive appeal, existing bus-factor measures rely on heterogeneous modeling assumptions, ambiguous definitions of failure, and domain-specific artifacts, limiting their generality, comparability, and ability to capture project fragmentation. In this paper, we develop a unified, domain-agnostic framework for bus-factor estimation by modeling projects as bipartite graphs of people and tasks and casting the computation of the bus-factor as a family of combinatorial optimization problems. Within this framework, we formalize and reconcile two complementary interpretations of the bus-factor, redundancy and criticality, corresponding to the Maximum Redundant Set and the Minimum Critical Set, respectively, and prove that both formulations are NP-hard. Building on this theoretical foundation, we introduce a novel bus-factor measure inspired by network robustness. Unlike prior approaches, the proposed measure captures both loss of coverage and increasing project fragmentation by tracking the largest connected set of tasks under progressive contributor removal. The resulting measure is normalized, threshold-free, and applicable across domains; we show that its exact computation is NP-hard as well. We further propose efficient linear-time approximation algorithms for all considered measures. Finally, we evaluate their behavior through a sensitivity analysis based on controlled perturbations of project structures, guided by expectations derived from project management theory. Our results show that the robustness-based measure behaves consistently with these expectations and provides a more informative and stable assessment of project risk than existing alternatives.
LGJul 23, 2025
Advancing Wildfire Risk Prediction via Morphology-Aware Curriculum Contrastive LearningFabrizio Lo Scudo, Alessio De Rango, Luca Furnari et al.
Wildfires significantly impact natural ecosystems and human health, leading to biodiversity loss, increased hydrogeological risks, and elevated emissions of toxic substances. Climate change exacerbates these effects, particularly in regions with rising temperatures and prolonged dry periods, such as the Mediterranean. This requires the development of advanced risk management strategies that utilize state-of-the-art technologies. However, in this context, the data show a bias toward an imbalanced setting, where the incidence of wildfire events is significantly lower than typical situations. This imbalance, coupled with the inherent complexity of high-dimensional spatio-temporal data, poses significant challenges for training deep learning architectures. Moreover, since precise wildfire predictions depend mainly on weather data, finding a way to reduce computational costs to enable more frequent updates using the latest weather forecasts would be beneficial. This paper investigates how adopting a contrastive framework can address these challenges through enhanced latent representations for the patch's dynamic features. We thus introduce a new morphology-based curriculum contrastive learning that mitigates issues associated with diverse regional characteristics and enables the use of smaller patch sizes without compromising performance. An experimental analysis is performed to validate the effectiveness of the proposed modeling strategies.
IVJun 24, 2024
μ-Net: A Deep Learning-Based Architecture for μ-CT SegmentationPierangela Bruno, Edoardo De Rose, Carlo Adornetto et al.
X-ray computed microtomography (μ-CT) is a non-destructive technique that can generate high-resolution 3D images of the internal anatomy of medical and biological samples. These images enable clinicians to examine internal anatomy and gain insights into the disease or anatomical morphology. However, extracting relevant information from 3D images requires semantic segmentation of the regions of interest, which is usually done manually and results time-consuming and tedious. In this work, we propose a novel framework that uses a convolutional neural network (CNN) to automatically segment the full morphology of the heart of Carassius auratus. The framework employs an optimized 2D CNN architecture that can infer a 3D segmentation of the sample, avoiding the high computational cost of a 3D CNN architecture. We tackle the challenges of handling large and high-resoluted image data (over a thousand pixels in each dimension) and a small training database (only three samples) by proposing a standard protocol for data normalization and processing. Moreover, we investigate how the noise, contrast, and spatial resolution of the sample and the training of the architecture are affected by the reconstruction technique, which depends on the number of input images. Experiments show that our framework significantly reduces the time required to segment new samples, allowing a faster microtomography analysis of the Carassius auratus heart shape. Furthermore, our framework can work with any bio-image (biological and medical) from μ-CT with high-resolution and small dataset size
AISep 15, 2012
Tractable Optimization Problems through Hypergraph-Based Structural RestrictionsGeorg Gottlob, Gianluigi Greco, Francesco Scarcello
Several variants of the Constraint Satisfaction Problem have been proposed and investigated in the literature for modelling those scenarios where solutions are associated with some given costs. Within these frameworks computing an optimal solution is an NP-hard problem in general; yet, when restricted over classes of instances whose constraint interactions can be modelled via (nearly-)acyclic graphs, this problem is known to be solvable in polynomial time. In this paper, larger classes of tractable instances are singled out, by discussing solution approaches based on exploiting hypergraph acyclicity and, more generally, structural decomposition methods, such as (hyper)tree decompositions.
AIApr 27, 2012
Magic Sets for Disjunctive Datalog ProgramsMario Alviano, Wolfgang Faber, Gianluigi Greco et al.
In this paper, a new technique for the optimization of (partially) bound queries over disjunctive Datalog programs with stratified negation is presented. The technique exploits the propagation of query bindings and extends the Magic Set (MS) optimization technique. An important feature of disjunctive Datalog is nonmonotonicity, which calls for nondeterministic implementations, such as backtracking search. A distinguishing characteristic of the new method is that the optimization can be exploited also during the nondeterministic phase. In particular, after some assumptions have been made during the computation, parts of the program may become irrelevant to a query under these assumptions. This allows for dynamic pruning of the search space. In contrast, the effect of the previously defined MS methods for disjunctive Datalog is limited to the deterministic portion of the process. In this way, the potential performance gain by using the proposed method can be exponential, as could be observed empirically. The correctness of MS is established thanks to a strong relationship between MS and unfounded sets that has not been studied in the literature before. This knowledge allows for extending the method also to programs with stratified negation in a natural way. The proposed method has been implemented in DLV and various experiments have been conducted. Experimental results on synthetic data confirm the utility of MS for disjunctive Datalog, and they highlight the computational gain that may be obtained by the new method w.r.t. the previously proposed MS methods for disjunctive Datalog programs. Further experiments on real-world data show the benefits of MS within an application scenario that has received considerable attention in recent years, the problem of answering user queries over possibly inconsistent databases originating from integration of autonomous sources of information.