Ashutosh Dixit

IR
6papers
24citations
Novelty23%
AI Score15

6 Papers

IRDec 30, 2015
URL ordering policies for distributed crawlers: a review

Deepika, Ashutosh Dixit

With the increase in size of web, the information is also spreading at large scale. Search Engines are the medium to access this information. Crawler is the module of search engine which is responsible for download the web pages. In order to download the fresh information and get the database rich, crawler should crawl the web in some order. This is called as ordering of URLs. URL ordering should be done in efficient and effective manner in order to crawl the web in proficient manner. In this paper, a survey is done on some existing methods of URL ordering and at the end of this paper comparison is also carried out among them.

IRSep 23, 2015
Design and Implementation of Domain based Semantic Hidden Web Crawler

Manvi, Komal Kumar Bhatia, Ashutosh Dixit

Web is a wide term which mainly consists of surface web and hidden web. One can easily access the surface web using traditional web crawlers, but they are not able to crawl the hidden portion of the web. These traditional crawlers retrieve contents from web pages, which are linked by hyperlinks ignoring the information hidden behind form pages, which cannot be extracted using simple hyperlink structure. Thus, they ignore large amount of data hidden behind search forms. This paper emphasizes on the extraction of hidden data behind html search forms. The proposed technique makes use of semantic mapping to fill the html search form using domain specific database. Using semantics to fill various fields of a form leads to more accurate and qualitative data extraction.

IRAug 10, 2015
A novel design of hidden web crawler using ontology

Manvi, Komal Kumar Bhatia, Ashutosh Dixit

Deep Web is content hidden behind HTML forms. Since it represents a large portion of the structured, unstructured and dynamic data on the Web, accessing Deep-Web content has been a long challenge for the database community. This paper describes a crawler for accessing Deep-Web using Ontologies. Performance evaluation of the proposed work showed that this new approach has promising results.

IRJul 31, 2013
A Novel Architecture for Relevant Blog Page Identifcation

Deepti Kapri, Rosy Madaan, A. K Sharma et al.

Blogs are undoubtedly the richest source of information available in cyberspace. Blogs can be of various natures i.e. personal blogs which contain posts on mixed issues or blogs can be domain specific which contains posts on particular topics, this is the reason, they offer wide variety of relevant information which is often focused. A general search engine gives back a huge collection of web pages which may or may not give correct answers, as web is the repository of information of all kinds and a user has to go through various documents before he gets what he was originally looking for, which is a very time consuming process. So, the search can be made more focused and accurate if it is limited to blogosphere instead of web pages. The reason being that the blogs are more focused in terms of information. So, User will only get related blogs in response to his query. These results will be then ranked according to our proposed method and are finally presented in front of user in descending order

IRJul 26, 2013
A Novel Architecture For Question Classification Based Indexing Scheme For Efficient Question Answering

Renu Mudgal, Rosy Madaan, A. K. Sharma et al.

Question answering system can be seen as the next step in information retrieval, allowing users to pose question in natural language and receive compact answers. For the Question answering system to be successful, research has shown that the correct classification of question with respect to the expected answer type is requisite. We propose a novel architecture for question classification and searching in the index, maintained on the basis of expected answer types, for efficient question answering. The system uses the criteria for Answer Relevance Score for finding the relevance of each answer returned by the system. On analysis of the proposed system, it has been found that the system has shown promising results than the existing systems based on question classification.

IRFeb 28, 2013
Presence Factor-Oriented Blog Summarization

Rosy Madaan, A. K. Sharma, Ashutosh Dixit

The research that has been carried out on blogs focused on blog posts only, ignoring the title of the blog page. Also, in summarization only a set of representative sentences are extracted. Some analysis has been done and it has been found that the blog post contains the content that is likely to be related to the topic of the blog post. Thus, proposed system of summarization makes use of title contained in a blog page. The approach makes use of the Presence factor that indicates the presence of each term of the title in each sentence of the blog post. This is a key feature because it considers those sentences as more relevant for summarization that contain each of the term present in the title. The system has been implemented and evaluated experimentally. The system has shown promising results.