An Analysis of Indexing and Querying Strategies on a Technologically Assisted Review Task
This work addresses the challenge of optimizing information retrieval for medical literature review, but it is incremental as it builds on existing tools and datasets.
The paper tackled the problem of improving document retrieval effectiveness in technologically assisted review tasks by experimenting with different indexing and query parsing strategies on the CLEF 2017 eHealth collection. The result showed that including more fields in the PubMed indexer of the Lucene4IR system significantly enhanced retrieval performance.
This paper presents a preliminary experimentation study using the CLEF 2017 eHealth Task 2 collection for evaluating the effectiveness of different indexing methodologies of documents and query parsing techniques. Furthermore, it is an attempt to advance and share the efforts of observing the characteristics and helpfulness of various methodologies for indexing PubMed documents and for different topic parsing techniques to produce queries. For this purpose, my research includes experimentation with different document indexing methodologies, by utilising existing tools, such as the Lucene4IR (L4IR) information retrieval system, the Technology Assisted Reviews for Empirical Medicine tool for parsing topics of the CLEF collection and the TREC evaluation tool to appraise system's performance. The results showed that including a greater number of fields to the PubMed indexer of L4IR is a decisive factor for the retrieval effectiveness of L4IR.