IRAIMar 6, 2023

Implementation of a noisy hyperlink removal system: A semantic and relatedness approach

arXiv:2303.03321v12 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This addresses a domain-specific issue for web data management, offering an incremental improvement over existing structural and string methods.

The paper tackles the problem of noisy hyperlinks in web graphs, which degrade information retrieval and link mining, by proposing a semantic and relatedness approach using DBpedia ontology and a reasoner, with experiments showing improved accuracy in removal.

As the volume of data on the web grows, the web structure graph, which is a graph representation of the web, continues to evolve. The structure of this graph has gradually shifted from content-based to non-content-based. Furthermore, spam data, such as noisy hyperlinks, in the web structure graph adversely affect the speed and efficiency of information retrieval and link mining algorithms. Previous works in this area have focused on removing noisy hyperlinks using structural and string approaches. However, these approaches may incorrectly remove useful links or be unable to detect noisy hyperlinks in certain circumstances. In this paper, a data collection of hyperlinks is initially constructed using an interactive crawler. The semantic and relatedness structure of the hyperlinks is then studied through semantic web approaches and tools such as the DBpedia ontology. Finally, the removal process of noisy hyperlinks is carried out using a reasoner on the DBpedia ontology. Our experiments demonstrate the accuracy and ability of semantic web technologies to remove noisy hyperlinks

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes