Beyond Ranked Lists: The SARAL Framework for Cross-Lingual Document Set Retrieval
This addresses the problem of retrieving sets of documents in multiple languages for users needing comprehensive information, though it appears incremental as it builds on existing CLIR efforts.
The paper tackles cross-lingual information retrieval by developing the SARAL framework to retrieve query-relevant document sets instead of ranked lists, achieving top performance in five out of six evaluation conditions across three languages.
Machine Translation for English Retrieval of Information in Any Language (MATERIAL) is an IARPA initiative targeted to advance the state of cross-lingual information retrieval (CLIR). This report provides a detailed description of Information Sciences Institute's (ISI's) Summarization and domain-Adaptive Retrieval Across Language's (SARAL's) effort for MATERIAL. Specifically, we outline our team's novel approach to handle CLIR with emphasis in developing an approach amenable to retrieve a query-relevant document \textit{set}, and not just a ranked document-list. In MATERIAL's Phase-3 evaluations, SARAL exceeded the performance of other teams in five out of six evaluation conditions spanning three different languages (Farsi, Kazakh, and Georgian).