CLOct 13, 2022

Early Discovery of Disappearing Entities in Microblogs

arXiv:2210.07404v1222 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the need for timely information about disappearing entities in microblogs to help users avoid missed opportunities or fruitless actions, though it is incremental as it builds on existing methods like distant supervision and word embeddings.

The paper tackles the problem of detecting disappearing entities (like events or services) from noisy microblog posts as early as possible, using time-sensitive distant supervision to build Twitter datasets and refining pretrained word embeddings. The result shows that over 70% of detected disappearing entities in Wikipedia are discovered earlier than Wikipedia updates, with an average lead-time exceeding one month.

We make decisions by reacting to changes in the real world, in particular, the emergence and disappearance of impermanent entities such as events, restaurants, and services. Because we want to avoid missing out on opportunities or making fruitless actions after they have disappeared, it is important to know when entities disappear as early as possible. We thus tackle the task of detecting disappearing entities from microblogs, whose posts mention various entities, in a timely manner. The major challenge is detecting uncertain contexts of disappearing entities from noisy microblog posts. To collect these disappearing contexts, we design time-sensitive distant supervision, which utilizes entities from the knowledge base and time-series posts, for this task to build large-scale Twitter datasets\footnote{We will release the datasets (tweet IDs) used in the experiments to promote reproducibility.} for English and Japanese. To ensure robust detection in noisy environments, we refine pretrained word embeddings of the detection model on microblog streams of the target day. Experimental results on the Twitter datasets confirmed the effectiveness of the collected labeled data and refined word embeddings; more than 70\% of the detected disappearing entities in Wikipedia are discovered earlier than the update on Wikipedia, and the average lead-time is over one month.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes