Michael Bewong

CR
h-index43
7papers
7citations
Novelty43%
AI Score27

7 Papers

SISep 13, 2024Code
MAPX: An explainable model-agnostic framework for the detection of false information on social media networks

Sarah Condran, Michael Bewong, Selasi Kwashie et al.

The automated detection of false information has become a fundamental task in combating the spread of "fake news" on online social media networks (OSMN) as it reduces the need for manual discernment by individuals. In the literature, leveraging various content or context features of OSMN documents have been found useful. However, most of the existing detection models often utilise these features in isolation without regard to the temporal and dynamic changes oft-seen in reality, thus, limiting the robustness of the models. Furthermore, there has been little to no consideration of the impact of the quality of documents' features on the trustworthiness of the final prediction. In this paper, we introduce a novel model-agnostic framework, called MAPX, which allows evidence based aggregation of predictions from existing models in an explainable manner. Indeed, the developed aggregation method is adaptive, dynamic and considers the quality of OSMN document features. Further, we perform extensive experiments on benchmarked fake news datasets to demonstrate the effectiveness of MAPX using various real-world data quality scenarios. Our empirical results show that the proposed framework consistently outperforms all state-of-the-art models evaluated. For reproducibility, a demo of MAPX is available at \href{https://github.com/SCondran/MAPX_framework}{this link}

LGSep 19, 2024
Is it Still Fair? A Comparative Evaluation of Fairness Algorithms through the Lens of Covariate Drift

Oscar Blessed Deho, Michael Bewong, Selasi Kwashie et al.

Over the last few decades, machine learning (ML) applications have grown exponentially, yielding several benefits to society. However, these benefits are tempered with concerns of discriminatory behaviours exhibited by ML models. In this regard, fairness in machine learning has emerged as a priority research area. Consequently, several fairness metrics and algorithms have been developed to mitigate against discriminatory behaviours that ML models may possess. Yet still, very little attention has been paid to the problem of naturally occurring changes in data patterns (\textit{aka} data distributional drift), and its impact on fairness algorithms and metrics. In this work, we study this problem comprehensively by analyzing 4 fairness-unaware baseline algorithms and 7 fairness-aware algorithms, carefully curated to cover the breadth of its typology, across 5 datasets including public and proprietary data, and evaluated them using 3 predictive performance and 10 fairness metrics. In doing so, we show that (1) data distributional drift is not a trivial occurrence, and in several cases can lead to serious deterioration of fairness in so-called fair models; (2) contrary to some existing literature, the size and direction of data distributional drift is not correlated to the resulting size and direction of unfairness; and (3) choice of, and training of fairness algorithms is impacted by the effect of data distributional drift which is largely ignored in the literature. Emanating from our findings, we synthesize several policy implications of data distributional drift on fairness algorithms that can be very relevant to stakeholders and practitioners.

CRApr 3, 2023
OutCenTR: A novel semi-supervised framework for predicting exploits of vulnerabilities in high-dimensional datasets

Hadi Eskandari, Michael Bewong, Sabih ur Rehman

An ever-growing number of vulnerabilities are reported every day. Yet these vulnerabilities are not all the same; Some are more targeted than others. Correctly estimating the likelihood of a vulnerability being exploited is a critical task for system administrators. This aids the system administrators in prioritizing and patching the right vulnerabilities. Our work makes use of outlier detection techniques to predict vulnerabilities that are likely to be exploited in highly imbalanced and high-dimensional datasets such as the National Vulnerability Database. We propose a dimensionality reduction technique, OutCenTR, that enhances the baseline outlier detection models. We further demonstrate the effectiveness and efficiency of OutCenTR empirically with 4 benchmark and 12 synthetic datasets. The results of our experiments show on average a 5-fold improvement of F1 score in comparison with state-of-the-art dimensionality reduction techniques such as PCA and GRP.

BMOct 20, 2024Code
A Heterogeneous Network-based Contrastive Learning Approach for Predicting Drug-Target Interaction

Junwei Hu, Michael Bewong, Selasi Kwashie et al.

Drug-target interaction (DTI) prediction is crucial for drug development and repositioning. Methods using heterogeneous graph neural networks (HGNNs) for DTI prediction have become a promising approach, with attention-based models often achieving excellent performance. However, these methods typically overlook edge features when dealing with heterogeneous biomedical networks. We propose a heterogeneous network-based contrastive learning method called HNCL-DTI, which designs a heterogeneous graph attention network to predict potential/novel DTIs. Specifically, our HNCL-DTI utilizes contrastive learning to collaboratively learn node representations from the perspective of both node-based and edge-based attention within the heterogeneous structure of biomedical networks. Experimental results show that HNCL-DTI outperforms existing advanced baseline methods on benchmark datasets, demonstrating strong predictive ability and practical effectiveness. The data and source code are available at https://github.com/Zaiwen/HNCL-DTI.

CRFeb 13, 2025
Setup Once, Secure Always: A Single-Setup Secure Federated Learning Aggregation Protocol with Forward and Backward Secrecy for Dynamic Users

Nazatul Haque Sultan, Yan Bo, Yansong Gao et al.

Federated Learning (FL) enables multiple users to collaboratively train a machine learning model without sharing raw data, making it suitable for privacy-sensitive applications. However, local model or weight updates can still leak sensitive information. Secure aggregation protocols mitigate this risk by ensuring that only the aggregated updates are revealed. Among these, single-setup protocols, where key generation and exchange occur only once, are the most efficient due to reduced communication and computation overhead. However, existing single-setup protocols often lack support for dynamic user participation and do not provide strong privacy guarantees such as forward and backward secrecy. \par In this paper, we present a novel secure aggregation protocol that requires only a single setup for the entire FL training. Our protocol supports dynamic user participation, tolerates dropouts, and achieves both forward and backward secrecy. It leverages lightweight symmetric homomorphic encryption with a key negation technique to mask updates efficiently, eliminating the need for user-to-user communication. To defend against model inconsistency attacks, we introduce a low-overhead verification mechanism using message authentication codes (MACs). We provide formal security proofs under both semi-honest and malicious adversarial models and implement a full prototype. Experimental results show that our protocol reduces user-side computation by up to $99\%$ compared to state-of-the-art protocols like e-SeaFL (ACSAC'24), while maintaining competitive model accuracy. These features make our protocol highly practical for real-world FL deployments, especially on resource-constrained devices.

AIOct 21, 2024
GIG: Graph Data Imputation With Graph Differential Dependencies

Jiang Hua, Michael Bewong, Selasi Kwashie et al.

Data imputation addresses the challenge of imputing missing values in database instances, ensuring consistency with the overall semantics of the dataset. Although several heuristics which rely on statistical methods, and ad-hoc rules have been proposed. These do not generalise well and often lack data context. Consequently, they also lack explainability. The existing techniques also mostly focus on the relational data context making them unsuitable for wider application contexts such as in graph data. In this paper, we propose a graph data imputation approach called GIG which relies on graph differential dependencies (GDDs). GIG, learns the GDDs from a given knowledge graph, and uses these rules to train a transformer model which then predicts the value of missing data within the graph. By leveraging GDDs, GIG incoporates semantic knowledge into the data imputation process making it more reliable and explainable. Experimental results on seven real-world datasets highlight GIG's effectiveness compared to existing state-of-the-art approaches.

DBAug 13, 2019
Linking Graph Entities with Multiplicity and Provenance

Jixue Liu, Selasi Kwashie, Jiuyong Li et al.

Entity linking and resolution is a fundamental database problem with applications in data integration, data cleansing, information retrieval, knowledge fusion, and knowledge-base population. It is the task of accurately identifying multiple, differing, and possibly contradicting representations of the same real-world entity in data. In this work, we propose an entity linking and resolution system capable of linking entities across different databases and mentioned-entities extracted from text data. Our entity linking/resolution solution, called Certus, uses a graph model to represent the profiles of entities. The graph model is versatile, thus, it is capable of handling multiple values for an attribute or a relationship, as well as the provenance descriptions of the values. Provenance descriptions of a value provide the settings of the value, such as validity periods, sources, security requirements, etc. This paper presents the architecture for the entity linking system, the logical, physical, and indexing models used in the system, and the general linking process. Furthermore, we demonstrate the performance of update operations of the physical storage models when the system is implemented in two state-of-the-art database management systems, HBase and Postgres.