IRDLSep 10, 2013

Resurrecting My Revolution: Using Social Link Neighborhood in Bringing Context to the Disappearing Web

arXiv:1309.2648v110 citations
Originality Synthesis-oriented
AI Analysis

This addresses link rot in social media for researchers and archivists, but is incremental as it builds on prior work with modest improvements.

The paper revisited a dataset of tweets to analyze link disappearance and archiving rates, finding that resources disappear from archives (7.89%) and reappear after being missing (6.54%), and proposed using tweet signatures to find replacement resources with 70+% textual similarity 41% of the time.

In previous work we reported that resources linked in tweets disappeared at the rate of 11% in the first year followed by 7.3% each year afterwards. We also found that in the first year 6.7%, and 14.6% in each subsequent year, of the resources were archived in public web archives. In this paper we revisit the same dataset of tweets and find that our prior model still holds and the calculated error for estimating percentages missing was about 4%, but we found the rate of archiving produced a higher error of about 11.5%. We also discovered that resources have disappeared from the archives themselves (7.89%) as well as reappeared on the live web after being declared missing (6.54%). We have also tested the availability of the tweets themselves and found that 10.34% have disappeared from the live web. To mitigate the loss of resources on the live web, we propose the use of a "tweet signature". Using the Topsy API, we extract the top five most frequent terms from the union of all tweets about a resource, and use these five terms as a query to Google. We found that using tweet signatures results in discovering replacement resources with 70+% textual similarity to the missing resource 41% of the time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes