IRDec 30, 2015

URL ordering policies for distributed crawlers: a review

arXiv:1611.01228v12 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental survey for search engine developers, summarizing and comparing known techniques without introducing new methods.

The paper reviews existing methods for URL ordering in distributed web crawlers, comparing their efficiency and effectiveness to improve web crawling proficiency.

With the increase in size of web, the information is also spreading at large scale. Search Engines are the medium to access this information. Crawler is the module of search engine which is responsible for download the web pages. In order to download the fresh information and get the database rich, crawler should crawl the web in some order. This is called as ordering of URLs. URL ordering should be done in efficient and effective manner in order to crawl the web in proficient manner. In this paper, a survey is done on some existing methods of URL ordering and at the end of this paper comparison is also carried out among them.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes