TakeLab Retriever: AI-Driven Search Engine for Articles from Croatian News Outlets
This provides a domain-specific tool for researchers analyzing Croatian news media, but it is incremental as it applies existing NLP methods to a new dataset.
The authors tackled the problem of searching and analyzing Croatian news articles by developing TakeLab Retriever, an AI-driven search engine that uses NLP methods to enable semantic analysis and handle over ten million articles from the past two decades.
TakeLab Retriever is an AI-driven search engine designed to discover, collect, and semantically analyze news articles from Croatian news outlets. It offers a unique perspective on the history and current landscape of Croatian online news media, making it an essential tool for researchers seeking to uncover trends, patterns, and correlations that general-purpose search engines cannot provide. TakeLab retriever utilizes cutting-edge natural language processing (NLP) methods, enabling users to sift through articles using named entities, phrases, and topics through the web application. This technical report is divided into two parts: the first explains how TakeLab Retriever is utilized, while the second provides a detailed account of its design. In the second part, we also address the software engineering challenges involved and propose solutions for developing a microservice-based semantic search engine capable of handling over ten million news articles published over the past two decades.