AIAug 11, 2023
Large Language Models and Knowledge Graphs: Opportunities and ChallengesJeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo et al.
Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges.
2.1SIMay 23
Generalized L-Modularity for Community Detection Beyond Simple Temporal NetworksVictor Brabant, Angela Bonifati, Remy Cazabet
Detecting communities in networks is essential for understanding the mesoscopic organization of complex systems. Interactions in most real-world networks evolve over time and exhibit diverse modalities: instantaneous events, continuous contacts that persist over intervals, and delayed interactions where source and destination are temporally separated, as observed in transportation processes. Additionally, interactions may be directed, weighted, or involve multiple node types. Existing methods for community detection in temporal networks typically handle only limited subsets of these features. When applied to real-world data, they often rely on simplifying transformations, such as aggregating interactions into time windows, projecting multipartite structures onto unipartite graphs, or ignoring edge directions and weights, leading to a loss of information. In this work, we generalize Longitudinal Modularity (L-Modularity) and the LAGO algorithm into a unified framework for dynamic community detection in complex link streams. Experiments on three real-world datasets demonstrate that our approach discovers meaningful communities in temporal networks with diverse interaction types.
CLJul 4, 2025Code
Graph Repairs with Large Language Models: An Empirical StudyHrishikesh Terdalkar, Angela Bonifati, Andrea Mauri
Property graphs are widely used in domains such as healthcare, finance, and social networks, but they often contain errors due to inconsistencies, missing data, or schema violations. Traditional rule-based and heuristic-driven graph repair methods are limited in their adaptability as they need to be tailored for each dataset. On the other hand, interactive human-in-the-loop approaches may become infeasible when dealing with large graphs, as the cost--both in terms of time and effort--of involving users becomes too high. Recent advancements in Large Language Models (LLMs) present new opportunities for automated graph repair by leveraging contextual reasoning and their access to real-world knowledge. We evaluate the effectiveness of six open-source LLMs in repairing property graphs. We assess repair quality, computational cost, and model-specific performance. Our experiments show that LLMs have the potential to detect and correct errors, with varying degrees of accuracy and efficiency. We discuss the strengths, limitations, and challenges of LLM-driven graph repair and outline future research directions for improving scalability and interpretability.
CLMay 14, 2024Code
Assisted Debate Builder with Large Language ModelsElliot Faugier, Frédéric Armetta, Angela Bonifati et al.
We introduce ADBL2, an assisted debate builder tool. It is based on the capability of large language models to generalise and perform relation-based argument mining in a wide-variety of domains. It is the first open-source tool that leverages relation-based mining for (1) the verification of pre-established relations in a debate and (2) the assisted creation of new arguments by means of large language models. ADBL2 is highly modular and can work with any open-source large language models that are used as plugins. As a by-product, we also provide the first fine-tuned Mistral-7B large language model for relation-based argument mining, usable by ADBL2, which outperforms existing approaches for this task with an overall F1-score of 90.59% across all domains.
SIAug 29, 2024
Longitudinal Modularity, a Modularity for Link StreamsVictor Brabant, Yasaman Asgari, Pierre Borgnat et al.
Temporal networks are commonly used to model real-life phenomena. When these phenomena represent interactions and are captured at a fine-grained temporal resolution, they are modeled as link streams. Community detection is an essential network analysis task. Although many methods exist for static networks, and some methods have been developed for temporal networks represented as sequences of snapshots, few works can handle link streams. This article introduces the first adaptation of the well-known Modularity quality function to link streams. Unlike existing methods, it is independent of the time scale of analysis. After introducing the quality function, and its relation to existing static and dynamic definitions of Modularity, we show experimentally its relevance for dynamic community evaluation.
LGFeb 18, 2025
$k$-Graph: A Graph Embedding for Interpretable Time Series ClusteringPaul Boniol, Donato Tiano, Angela Bonifati et al.
Time series clustering poses a significant challenge with diverse applications across domains. A prominent drawback of existing solutions lies in their limited interpretability, often confined to presenting users with centroids. In addressing this gap, our work presents $k$-Graph, an unsupervised method explicitly crafted to augment interpretability in time series clustering. Leveraging a graph representation of time series subsequences, $k$-Graph constructs multiple graph representations based on different subsequence lengths. This feature accommodates variable-length time series without requiring users to predetermine subsequence lengths. Our experimental results reveal that $k$-Graph outperforms current state-of-the-art time series clustering algorithms in accuracy, while providing users with meaningful explanations and interpretations of the clustering outcomes.
LGJan 13
Continuous Fairness On Data StreamsSubhodeep Ghosh, Zhihui Du, Angela Bonifati et al.
We study the problem of enforcing continuous group fairness over windows in data streams. We propose a novel fairness model that ensures group fairness at a finer granularity level (referred to as block) within each sliding window. This formulation is particularly useful when the window size is large, making it desirable to enforce fairness at a finer granularity. Within this framework, we address two key challenges: efficiently monitoring whether each sliding window satisfies block-level group fairness, and reordering the current window as effectively as possible when fairness is violated. To enable real-time monitoring, we design sketch-based data structures that maintain attribute distributions with minimal overhead. We also develop optimal, efficient algorithms for the reordering task, supported by rigorous theoretical guarantees. Our evaluation on four real-world streaming scenarios demonstrates the practical effectiveness of our approach. We achieve millisecond-level processing and a throughput of approximately 30,000 queries per second on average, depending on system parameters. The stream reordering algorithm improves block-level group fairness by up to 95% in certain cases, and by 50-60% on average across datasets. A qualitative study further highlights the advantages of block-level fairness compared to window-level fairness.
SIOct 1, 2025
Discovering Communities in Continuous-Time Temporal Networks by Optimizing L-ModularityVictor Brabant, Angela Bonifati, Rémy Cazabet
Community detection is a fundamental problem in network analysis, with many applications in various fields. Extending community detection to the temporal setting with exact temporal accuracy, as required by real-world dynamic data, necessitates methods specifically adapted to the temporal nature of interactions. We introduce LAGO, a novel method for uncovering dynamic communities by greedy optimization of Longitudinal Modularity, a specific adaptation of Modularity for continuous-time networks. Unlike prior approaches that rely on time discretization or assume rigid community evolution, LAGO captures the precise moments when nodes enter and exit communities. We evaluate LAGO on synthetic benchmarks and real-world datasets, demonstrating its ability to efficiently uncover temporally and topologically coherent communities.
LGMar 10, 2025
Graphint: Graph-based Time Series Clustering Visualisation ToolPaul Boniol, Donato Tiano, Angela Bonifati et al.
With the exponential growth of time series data across diverse domains, there is a pressing need for effective analysis tools. Time series clustering is important for identifying patterns in these datasets. However, prevailing methods often encounter obstacles in maintaining data relationships and ensuring interpretability. We present Graphint, an innovative system based on the $k$-Graph methodology that addresses these challenges. Graphint integrates a robust time series clustering algorithm with an interactive tool for comparison and interpretation. More precisely, our system allows users to compare results against competing approaches, identify discriminative subsequences within specified datasets, and visualize the critical information utilized by $k$-Graph to generate outputs. Overall, Graphint offers a comprehensive solution for extracting actionable insights from complex temporal datasets.
DBApr 16, 2020
Holding a Conference Online and Live due to COVID-19Angela Bonifati, Giovanna Guerrini, Carsten Lutz et al.
The joint EDBT/ICDT conference (International Conference on Extending Database Technology/International Conference on Database Theory) is a well established conference series on data management, with annual meetings in the second half of March that attract 250 to 300 delegates. Three weeks before EDBT/ICDT 2020 was planned to take place in Copenhagen, the rapidly developing Covid-19 pandemic led to the decision to cancel the face-to-face event. In the interest of the research community, it was decided to move the conference online while trying to preserve as much of the real-life experience as possible. As far as we know, we are one of the first conferences that moved to a fully synchronous online experience due to the COVID-19 outbreak. With fully synchronous, we mean that participants jointly listened to presentations, had live Q&A, and attended other live events associated with the conference. In this report, we share our decisions, experiences, and lessons learned.
DBJan 22, 2020
Graph Generators: State of the Art and Open ChallengesAngela Bonifati, Irena Holubová, Arnau Prat-Pérez et al.
The abundance of interconnected data has fueled the design and implementation of graph generators reproducing real-world linking properties, or gauging the effectiveness of graph algorithms, techniques and applications manipulating these data. We consider graph generation across multiple subfields, such as Semantic Web, graph databases, social networks, and community detection, along with general graphs. Despite the disparate requirements of modern graph generators throughout these communities, we analyze them under a common umbrella, reaching out the functionalities, the practical usage, and their supported operations. We argue that this classification is serving the need of providing scientists, researchers and practitioners with the right data generator at hand for their work. This survey provides a comprehensive overview of the state-of-the-art graph generators by focusing on those that are pertinent and suitable for several data-intensive tasks. Finally, we discuss open challenges and missing requirements of current graph generators along with their future extensions to new emerging fields.
DBMar 5, 2015
Mapping-equivalence and oid-equivalence of single-function object-creating conjunctive queriesAngela Bonifati, Werner Nutt, Riccardo Torlone et al.
Conjunctive database queries have been extended with a mechanism for object creation to capture important applications such as data exchange, data integration, and ontology-based data access. Object creation generates new object identifiers in the result, that do not belong to the set of constants in the source database. The new object identifiers can be also seen as Skolem terms. Hence, object-creating conjunctive queries can also be regarded as restricted second-order tuple-generating dependencies (SO tgds), considered in the data exchange literature. In this paper, we focus on the class of single-function object-creating conjunctive queries, or sifo CQs for short. We give a new characterization for oid-equivalence of sifo CQs that is simpler than the one given by Hull and Yoshikawa and places the problem in the complexity class NP. Our characterization is based on Cohen's equivalence notions for conjunctive queries with multiplicities. We also solve the logical entailment problem for sifo CQs, showing that also this problem belongs to NP. Results by Pichler et al. have shown that logical equivalence for more general classes of SO tgds is either undecidable or decidable with as yet unknown complexity upper bounds.