AIDec 18, 2025

Unsupervised Thematic Clustering Of hadith Texts Using The Apriori Algorithm

arXiv:2512.16694v1h-index: 8
Originality Synthesis-oriented
AI Analysis

This work addresses the need for digital Islamic studies and technology-based learning systems, but it is incremental as it applies an existing method to a new domain-specific dataset.

The research tackled the problem of automating thematic grouping of hadith texts by applying the Apriori algorithm to an Indonesian translation dataset, resulting in meaningful association patterns like rakaat-prayer and verse-revelation that describe themes such as worship and revelation.

This research stems from the urgency to automate the thematic grouping of hadith in line with the growing digitalization of Islamic texts. Based on a literature review, the unsupervised learning approach with the Apriori algorithm has proven effective in identifying association patterns and semantic relations in unlabeled text data. The dataset used is the Indonesian Translation of the hadith of Bukhari, which first goes through preprocessing stages including case folding, punctuation cleaning, tokenization, stopword removal, and stemming. Next, an association rule mining analysis was conducted using the Apriori algorithm with support, confidence, and lift parameters. The results show the existence of meaningful association patterns such as the relationship between rakaat-prayer, verse-revelation, and hadith-story, which describe the themes of worship, revelation, and hadith narration. These findings demonstrate that the Apriori algorithm has the ability to automatically uncover latent semantic relationships, while contributing to the development of digital Islamic studies and technology-based learning systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes