CLIRApr 17, 2020

Batch Clustering for Multilingual News Streaming

arXiv:2004.08123v117 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of managing diverse and unorganized multilingual news streams for readers and editors, though it is incremental as it builds on existing methods.

The paper tackles the problem of organizing large volumes of multilingual news articles into coherent stories by extending Topic Detection and Tracking with a batch processing approach, achieving state-of-the-art results on datasets in Spanish, German, and English.

Nowadays, digital news articles are widely available, published by various editors and often written in different languages. This large volume of diverse and unorganized information makes human reading very difficult or almost impossible. This leads to a need for algorithms able to arrange high amount of multilingual news into stories. To this purpose, we extend previous works on Topic Detection and Tracking, and propose a new system inspired from newsLens. We process articles per batch, looking for monolingual local topics which are then linked across time and languages. Here, we introduce a novel "replaying" strategy to link monolingual local topics into stories. Besides, we propose new fine tuned multilingual embedding using SBERT to create crosslingual stories. Our system gives monolingual state-of-the-art results on dataset of Spanish and German news and crosslingual state-of-the-art results on English, Spanish and German news.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes