A New Compression Based Index Structure for Efficient Information Retrieval
This addresses the issue of exponential information growth affecting retrieval quality for users of IR systems, but it is incremental as it builds on existing compression techniques.
The paper tackled the problem of large index structures in information retrieval systems by compressing document numbers in inverted file entries using a new run-length encoding-based coding technique, achieving an average compression improvement of 67.34% compared to other methods.
Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR system. Now a day exponential growth of information makes the index structure large enough affecting the IR system's quality. So compressing the Index structure is our main contribution in this paper. We compressed the document number in inverted file entries using a new coding technique based on run-length encoding. Our coding mechanism uses a specified code which acts over run-length coding. We experimented and found that our coding mechanism on an average compresses 67.34% percent more than the other techniques.