NINov 7, 2025
A Taxonomy and Comparative Analysis of IPv4 Identifier Selection Correctness, Security, and PerformanceJoshua J. Daymude, Antonio M. Espinoza, Holly Bergen et al.
The battle for a more secure Internet is waged on many fronts, including the most basic of networking protocols. Our focus is the IPv4 Identifier (IPID), an IPv4 header field as old as the Internet with an equally long history as an exploited side channel for scanning network properties, inferring off-path connections, and poisoning DNS caches. This article taxonomizes the 25-year history of IPID-based exploits and the corresponding changes to IPID selection methods. By mathematically analyzing these methods' correctness and security and empirically evaluating their performance, we reveal recommendations for best practice as well as shortcomings of current operating system implementations, emphasizing the value of systematic evaluations in network security.
CRJul 30, 2020
The Program with a Personality: Analysis of Elk Cloner, the First Personal Computer VirusScott Levy, Jedidiah R. Crandall
Although self-replicating programs and viruses have existed since the 1960s and 70s, Elk Cloner was the first virus to circulate among personal computers in the wild. Despite its historical significance, it received comparatively little attention when it first appeared in 1982. In this paper, we: present the first detailed examination of the operation and structure of Elk Cloner; discuss the effect of environmental characteristics on its virulence; and provide supporting evidence for several hypotheses about why its release was largely ignored in the early 1980s.
SIJun 26, 2019
Assessing Post Deletion in Sina Weibo: Multi-modal Classification of Hot TopicsMeisam Navaki Arefi, Rajkumar Pandi, Michael Carl Tschantz et al.
Widespread Chinese social media applications such as Weibo are widely known for monitoring and deleting posts to conform to Chinese government requirements. In this paper, we focus on analyzing a dataset of censored and uncensored posts in Weibo. Despite previous work that only considers text content of posts, we take a multi-modal approach that takes into account both text and image content. We categorize this dataset into 14 categories that have the potential to be censored on Weibo, and seek to quantify censorship by topic. Specifically, we investigate how different factors interact to affect censorship. We also investigate how consistently and how quickly different topics are censored. To this end, we have assembled an image dataset with 18,966 images, as well as a text dataset with 994 posts from 14 categories. We then utilized deep learning, CNN localization, and NLP techniques to analyze the target dataset and extract categories, for further analysis to better understand censorship mechanisms in Weibo. We found that sentiment is the only indicator of censorship that is consistent across the variety of topics we identified. Our finding matches with recently leaked logs from Sina Weibo. We also discovered that most categories like those related to anti-government actions (e.g. protest) or categories related to politicians (e.g. Xi Jinping) are often censored, whereas some categories such as crisis-related categories (e.g. rainstorm) are less frequently censored. We also found that censored posts across all categories are deleted in three hours on average.
CYMar 4, 2013
The Velocity of Censorship: High-Fidelity Detection of Microblog Post DeletionsTao Zhu, David Phipps, Adam Pridgen et al.
Weibo and other popular Chinese microblogging sites are well known for exercising internal censorship, to comply with Chinese government requirements. This research seeks to quantify the mechanisms of this censorship: how fast and how comprehensively posts are deleted.Our analysis considered 2.38 million posts gathered over roughly two months in 2012, with our attention focused on repeatedly visiting "sensitive" users. This gives us a view of censorship events within minutes of their occurrence, albeit at a cost of our data no longer representing a random sample of the general Weibo population. We also have a larger 470 million post sampling from Weibo's public timeline, taken over a longer time period, that is more representative of a random sample. We found that deletions happen most heavily in the first hour after a post has been submitted. Focusing on original posts, not reposts/retweets, we observed that nearly 30% of the total deletion events occur within 5- 30 minutes. Nearly 90% of the deletions happen within the first 24 hours. Leveraging our data, we also considered a variety of hypotheses about the mechanisms used by Weibo for censorship, such as the extent to which Weibo's censors use retrospective keyword-based censorship, and how repost/retweet popularity interacts with censorship. We also used natural language processing techniques to analyze which topics were more likely to be censored.
IRJun 21, 2012
A Pointillism Approach for Natural Language Processing of Social MediaPeiyou Song, Anhei Shu, Anyu Zhou et al.
The Chinese language poses challenges for natural language processing based on the unit of a word even for formal uses of the Chinese language, social media only makes word segmentation in Chinese even more difficult. In this document we propose a pointillism approach to natural language processing. Rather than words that have individual meanings, the basic unit of a pointillism approach is trigrams of characters. These grams take on meaning in aggregate when they appear together in a way that is correlated over time. Our results from three kinds of experiments show that when words and topics do have a meme-like trend, they can be reconstructed from only trigrams. For example, for 4-character idioms that appear at least 99 times in one day in our data, the unconstrained precision (that is, precision that allows for deviation from a lexicon when the result is just as correct as the lexicon version of the word or phrase) is 0.93. For longer words and phrases collected from Wiktionary, including neologisms, the unconstrained precision is 0.87. We consider these results to be very promising, because they suggest that it is feasible for a machine to reconstruct complex idioms, phrases, and neologisms with good precision without any notion of words. Thus the colorful and baroque uses of language that typify social media in challenging languages such as Chinese may in fact be accessible to machines.