SISep 13, 2024Code
MAPX: An explainable model-agnostic framework for the detection of false information on social media networksSarah Condran, Michael Bewong, Selasi Kwashie et al.
The automated detection of false information has become a fundamental task in combating the spread of "fake news" on online social media networks (OSMN) as it reduces the need for manual discernment by individuals. In the literature, leveraging various content or context features of OSMN documents have been found useful. However, most of the existing detection models often utilise these features in isolation without regard to the temporal and dynamic changes oft-seen in reality, thus, limiting the robustness of the models. Furthermore, there has been little to no consideration of the impact of the quality of documents' features on the trustworthiness of the final prediction. In this paper, we introduce a novel model-agnostic framework, called MAPX, which allows evidence based aggregation of predictions from existing models in an explainable manner. Indeed, the developed aggregation method is adaptive, dynamic and considers the quality of OSMN document features. Further, we perform extensive experiments on benchmarked fake news datasets to demonstrate the effectiveness of MAPX using various real-world data quality scenarios. Our empirical results show that the proposed framework consistently outperforms all state-of-the-art models evaluated. For reproducibility, a demo of MAPX is available at \href{https://github.com/SCondran/MAPX_framework}{this link}
LGSep 19, 2024
Is it Still Fair? A Comparative Evaluation of Fairness Algorithms through the Lens of Covariate DriftOscar Blessed Deho, Michael Bewong, Selasi Kwashie et al.
Over the last few decades, machine learning (ML) applications have grown exponentially, yielding several benefits to society. However, these benefits are tempered with concerns of discriminatory behaviours exhibited by ML models. In this regard, fairness in machine learning has emerged as a priority research area. Consequently, several fairness metrics and algorithms have been developed to mitigate against discriminatory behaviours that ML models may possess. Yet still, very little attention has been paid to the problem of naturally occurring changes in data patterns (\textit{aka} data distributional drift), and its impact on fairness algorithms and metrics. In this work, we study this problem comprehensively by analyzing 4 fairness-unaware baseline algorithms and 7 fairness-aware algorithms, carefully curated to cover the breadth of its typology, across 5 datasets including public and proprietary data, and evaluated them using 3 predictive performance and 10 fairness metrics. In doing so, we show that (1) data distributional drift is not a trivial occurrence, and in several cases can lead to serious deterioration of fairness in so-called fair models; (2) contrary to some existing literature, the size and direction of data distributional drift is not correlated to the resulting size and direction of unfairness; and (3) choice of, and training of fairness algorithms is impacted by the effect of data distributional drift which is largely ignored in the literature. Emanating from our findings, we synthesize several policy implications of data distributional drift on fairness algorithms that can be very relevant to stakeholders and practitioners.
BMOct 20, 2024Code
A Heterogeneous Network-based Contrastive Learning Approach for Predicting Drug-Target InteractionJunwei Hu, Michael Bewong, Selasi Kwashie et al.
Drug-target interaction (DTI) prediction is crucial for drug development and repositioning. Methods using heterogeneous graph neural networks (HGNNs) for DTI prediction have become a promising approach, with attention-based models often achieving excellent performance. However, these methods typically overlook edge features when dealing with heterogeneous biomedical networks. We propose a heterogeneous network-based contrastive learning method called HNCL-DTI, which designs a heterogeneous graph attention network to predict potential/novel DTIs. Specifically, our HNCL-DTI utilizes contrastive learning to collaboratively learn node representations from the perspective of both node-based and edge-based attention within the heterogeneous structure of biomedical networks. Experimental results show that HNCL-DTI outperforms existing advanced baseline methods on benchmark datasets, demonstrating strong predictive ability and practical effectiveness. The data and source code are available at https://github.com/Zaiwen/HNCL-DTI.
AIOct 21, 2024
GIG: Graph Data Imputation With Graph Differential DependenciesJiang Hua, Michael Bewong, Selasi Kwashie et al.
Data imputation addresses the challenge of imputing missing values in database instances, ensuring consistency with the overall semantics of the dataset. Although several heuristics which rely on statistical methods, and ad-hoc rules have been proposed. These do not generalise well and often lack data context. Consequently, they also lack explainability. The existing techniques also mostly focus on the relational data context making them unsuitable for wider application contexts such as in graph data. In this paper, we propose a graph data imputation approach called GIG which relies on graph differential dependencies (GDDs). GIG, learns the GDDs from a given knowledge graph, and uses these rules to train a transformer model which then predicts the value of missing data within the graph. By leveraging GDDs, GIG incoporates semantic knowledge into the data imputation process making it more reliable and explainable. Experimental results on seven real-world datasets highlight GIG's effectiveness compared to existing state-of-the-art approaches.
DBAug 13, 2019
Linking Graph Entities with Multiplicity and ProvenanceJixue Liu, Selasi Kwashie, Jiuyong Li et al.
Entity linking and resolution is a fundamental database problem with applications in data integration, data cleansing, information retrieval, knowledge fusion, and knowledge-base population. It is the task of accurately identifying multiple, differing, and possibly contradicting representations of the same real-world entity in data. In this work, we propose an entity linking and resolution system capable of linking entities across different databases and mentioned-entities extracted from text data. Our entity linking/resolution solution, called Certus, uses a graph model to represent the profiles of entities. The graph model is versatile, thus, it is capable of handling multiple values for an attribute or a relationship, as well as the provenance descriptions of the values. Provenance descriptions of a value provide the settings of the value, such as validity periods, sources, security requirements, etc. This paper presents the architecture for the entity linking system, the logical, physical, and indexing models used in the system, and the general linking process. Furthermore, we demonstrate the performance of update operations of the physical storage models when the system is implemented in two state-of-the-art database management systems, HBase and Postgres.