NIMay 29
Where's Waldo Library? Using Reverse IP Geolocation to Identify Library IPsNishant Acharya, Anyu Yang, Humaira Fasih Ahmed Hashmi et al.
Community anchor institutions (CAIs), such as libraries, schools, and community centers, are critical for providing Internet access to un- or under-served individuals and communities. Because many of these institutions are themselves under-provisioned, analyzing the reliability and quality of their Internet service is important. Doing so at scale requires knowing the IP addresses of these institutions so that broadband measurement and policy evaluation can occur. Unfortunately, these IPs are not systematically documented. As a first step towards widespread, scalable evaluation of CAI Internet connectivity, this paper presents Reverse IP Geolocation (RG), a new framework to infer IP addresses from physical address data. A key insight is that CAI street addresses are publicly known, which allows us to identify a candidate set of IPs from commercial geolocation that are likely serving the location associated with a CAI. In this paper, \textbf{we focus on US public libraries}, which offer both geographic diversity across thousands of locations, and some publicly available institutional records (\eg{}WHOIS registrations) that enable systematic validation of our approach. Our approach offers a novel integration of IP geolocation databases, DNS PTR records, WHOIS registrations, broadband provider data, and active measurements to identify IPs likely assigned to libraries and validate them. Based on evaluations, our approach can map a library to its IP prefix approx. half of the time, with coverage across all US states, as well as urban and rural areas. Our results highlight the feasibility of mapping CAI presence in IP space and offer a foundation for large-scale, remote broadband infrastructure evaluation.
NIMay 29
Stratifying the Digital Divide: Analysis of Socio-Economic Influences on Internet PerformanceShivani Kalamadi, Aditya Bej, Sachin Kumar Singh et al.
Despite numerous technological advancements, the digital divide remains a pressing issue affecting millions worldwide. We present a framework for diagnosing internet inequality at the Census Block Group level by pairing approximately 170 million crowdsourced Ookla speed tests (2021--2025) with U.S. Census demographics across six metropolitan regions. After quantifying and correcting for sampling bias, we use Random Forest regression with permutation importance to identify the socio-economic drivers of download speed, upload speed, and latency. Population density dominates all three metrics at the regional level, but this dominance is an artifact of scale: once areas are stratified into density bins, its influence vanishes in medium- and higher-density neighborhoods, revealing that socio-economic conditions are the true differentiators of internet quality in most urban settings. After controlling for density, income and racial composition emerge as the primary drivers, income consistently dictating upload speed and racial composition proving to be a stronger predictor of download speed than either income or education. Our findings demonstrate that internet inequality is locally configured: no single national narrative explains it, and effective policy demands region-specific intervention.
APNov 4, 2023
Mobile Internet Quality Estimation using Self-Tuning Kernel RegressionHanyang Jiang, Henry Shaowu Yuchi, Elizabeth Belding et al.
Modeling and estimation for spatial data are ubiquitous in real life, frequently appearing in weather forecasting, pollution detection, and agriculture. Spatial data analysis often involves processing datasets of enormous scale. In this work, we focus on large-scale internet-quality open datasets from Ookla. We look into estimating mobile (cellular) internet quality at the scale of a state in the United States. In particular, we aim to conduct estimation based on highly {\it imbalanced} data: Most of the samples are concentrated in limited areas, while very few are available in the rest, posing significant challenges to modeling efforts. We propose a new adaptive kernel regression approach that employs self-tuning kernels to alleviate the adverse effects of data imbalance in this problem. Through comparative experimentation on two distinct mobile network measurement datasets, we demonstrate that the proposed self-tuning kernel regression method produces more accurate predictions, with the potential to be applied in other applications.
NIMar 30
Quality of Coverage (QoC): Quantifying Cellular Network Coverage Quality, Usability and StabilityVarshika Srinivasavaradhan, Morgan Vigil-Hayes, Ellen Zegura et al.
Characterizing cellular network performance is complex. Current representations of cellular coverage, such as service provider and FCC coverage maps, focus only on the minimal level of available bandwidth (e.g., 35/3Mbps download/upload speed for 5G) and omit critical dimensions of quality: network usability and stability over space and time. Because cellular performance can vary substantially along both dimensions, a more fine-grained characterization is necessary. We introduce Quality of Coverage (QoC), a novel multi-dimensional set of key performance indicators (KPIs) that capture measured temporal and spatial performance quality, usability and stability. To evaluate QoC, we first analyze whether the QoC KPIs accurately reflect expected network behavior at individual locations and across spatially-aggregated regions. Then, we apply QoC to more than 15 million measurements from a production network to evaluate its ability to characterize real-world network behavior. Together, our results demonstrate the need for KPIs that capture the full spectrum of cellular performance and show how QoC enables rigorous evaluation of coverage quality across multiple geographic scales.
NIOct 24, 2025
TURBOTEST: Learning When Less is Enough through Early Termination of Internet Speed TestsHaarika Manda, Manshi Sagar, Yogesh et al.
Internet speed tests are indispensable for users, ISPs, and policymakers, but their static flooding-based design imposes growing costs: a single high-speed test can transfer hundreds of megabytes, and collectively, platforms like Ookla, M-Lab, and Fast.com generate petabytes of traffic each month. Reducing this burden requires deciding when a test can be stopped early without sacrificing accuracy. We frame this as an optimal stopping problem and show that existing heuristics-static thresholds, BBR pipe-full signals, or throughput stability rules from Fast.com and FastBTS-capture only a narrow portion of the achievable accuracy-savings trade-off. This paper introduces TURBOTEST, a systematic framework for speed test termination that sits atop existing platforms. The key idea is to decouple throughput prediction (Stage 1) from test termination (Stage 2): Stage 1 trains a regressor to estimate final throughput from partial measurements, while Stage 2 trains a classifier to decide when sufficient evidence has accumulated to stop. Leveraging richer transport-level features (RTT, retransmissions, congestion window) alongside throughput, TURBOTEST exposes a single tunable parameter for accuracy tolerance and includes a fallback mechanism for high-variability cases. Evaluation on 173,000 M-Lab NDT speed tests (2024-2025) shows that TURBOTEST achieves nearly 2-4x higher data savings than an approach based on BBR signals while reducing median error. These results demonstrate that adaptive ML-based termination can deliver accurate, efficient, and deployable speed tests at scale.
NIJun 4, 2024
Learning Cellular Network Connection Quality with ConformalHanyang Jiang, Elizabeth Belding, Ellen Zegure et al.
In this paper, we address the problem of uncertainty quantification for cellular network speed. It is a well-known fact that the actual internet speed experienced by a mobile phone can fluctuate significantly, even when remaining in a single location. This high degree of variability underscores that mere point estimation of network speed is insufficient. Rather, it is advantageous to establish a prediction interval that can encompass the expected range of speed variations. In order to build an accurate network estimation map, numerous mobile data need to be collected at different locations. Currently, public datasets rely on users to upload data through apps. Although massive data has been collected, the datasets suffer from significant noise due to the nature of cellular networks and various other factors. Additionally, the uneven distribution of population density affects the spatial consistency of data collection, leading to substantial uncertainty in the network quality maps derived from this data. We focus our analysis on large-scale internet-quality datasets provided by Ookla to construct an estimated map of connection quality. To improve the reliability of this map, we introduce a novel conformal prediction technique to build an uncertainty map. We identify regions with heightened uncertainty to prioritize targeted, manual data collection. In addition, the uncertainty map quantifies how reliable the prediction is in different areas. Our method also leads to a sampling strategy that guides researchers to selectively gather high-quality data that best complement the current dataset to improve the overall accuracy of the prediction model.
LGNov 9, 2019
Towards Understanding Gender Bias in Relation ExtractionAndrew Gaut, Tony Sun, Shirlyn Tang et al.
Recent developments in Neural Relation Extraction (NRE) have made significant strides towards Automated Knowledge Base Construction (AKBC). While much attention has been dedicated towards improvements in accuracy, there have been no attempts in the literature to our knowledge to evaluate social biases in NRE systems. We create WikiGenderBias, a distantly supervised dataset with a human annotated test set. WikiGenderBias has sentences specifically curated to analyze gender bias in relation extraction systems. We use WikiGenderBias to evaluate systems for bias and find that NRE systems exhibit gender biased predictions and lay groundwork for future evaluation of bias in NRE. We also analyze how name anonymization, hard debiasing for word embeddings, and counterfactual data augmentation affect gender bias in predictions and performance.
CLSep 10, 2019
A Benchmark Dataset for Learning to Intervene in Online Hate SpeechJing Qian, Anna Bethke, Yinyin Liu et al.
Countering online hate speech is a critical yet challenging task, but one which can be aided by the use of Natural Language Processing (NLP) techniques. Previous research has primarily focused on the development of NLP methods to automatically and effectively detect online hate speech while disregarding further action needed to calm and discourage individuals from using hate speech in the future. In addition, most existing hate speech datasets treat each post as an isolated instance, ignoring the conversational context. In this paper, we propose a novel task of generative hate speech intervention, where the goal is to automatically generate responses to intervene during online conversations that contain hate speech. As a part of this work, we introduce two fully-labeled large-scale hate speech intervention datasets collected from Gab and Reddit. These datasets provide conversation segments, hate speech labels, as well as intervention responses written by Mechanical Turk Workers. In this paper, we also analyze the datasets to understand the common intervention strategies and explore the performance of common automatic response generation methods on these new datasets to provide a benchmark for future research.
CLJun 21, 2019
Mitigating Gender Bias in Natural Language Processing: Literature ReviewTony Sun, Andrew Gaut, Shirlyn Tang et al.
As Natural Language Processing (NLP) and Machine Learning (ML) tools rise in popularity, it becomes increasingly vital to recognize the role they play in shaping societal biases and stereotypes. Although NLP models have shown success in modeling various applications, they propagate and may even amplify gender bias found in text corpora. While the study of bias in artificial intelligence is not new, methods to mitigate gender bias in NLP are relatively nascent. In this paper, we review contemporary studies on recognizing and mitigating gender bias in NLP. We discuss gender bias based on four forms of representation bias and analyze methods recognizing gender bias. Furthermore, we discuss the advantages and drawbacks of existing gender debiasing methods. Finally, we discuss future studies for recognizing and mitigating gender bias in NLP.
CLApr 4, 2019
Learning to Decipher Hate SymbolsJing Qian, Mai ElSherief, Elizabeth Belding et al.
Existing computational models to understand hate speech typically frame the problem as a simple classification task, bypassing the understanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this paper, we propose a novel task of deciphering hate symbols. To do this, we leverage the Urban Dictionary and collected a new, symbol-rich Twitter corpus of hate speech. We investigate neural network latent context models for deciphering hate symbols. More specifically, we study Sequence-to-Sequence models and show how they are able to crack the ciphers based on context. Furthermore, we propose a novel Variational Decipher and show how it can generalize better to unseen hate symbols in a more challenging testing setting.
AINov 3, 2018
SafeRoute: Learning to Navigate Streets Safely in an Urban EnvironmentSharon Levy, Wenhan Xiong, Elizabeth Belding et al.
Recent studies show that 85% of women have changed their traveled route to avoid harassment and assault. Despite this, current mapping tools do not empower users with information to take charge of their personal safety. We propose SafeRoute, a novel solution to the problem of navigating cities and avoiding street harassment and crime. Unlike other street navigation applications, SafeRoute introduces a new type of path generation via deep reinforcement learning. This enables us to successfully optimize for multi-criteria path-finding and incorporate representation learning within our framework. Our agent learns to pick favorable streets to create a safe and short path with a reward function that incorporates safety and efficiency. Given access to recent crime reports in many urban cities, we train our model for experiments in Boston, New York, and San Francisco. We test our model on areas of these cities, specifically the populated downtown regions where tourists and those unfamiliar with the streets walk. We evaluate SafeRoute and successfully improve over state-of-the-art methods by up to 17% in local average distance from crimes while decreasing path length by up to 7%.
CLAug 31, 2018
Hierarchical CVAE for Fine-Grained Hate Speech ClassificationJing Qian, Mai ElSherief, Elizabeth Belding et al.
Existing work on automated hate speech detection typically focuses on binary classification or on differentiating among a small set of categories. In this paper, we propose a novel method on a fine-grained hate speech classification task, which focuses on differentiating among 40 hate groups of 13 different hate group categories. We first explore the Conditional Variational Autoencoder (CVAE) as a discriminative model and then extend it to a hierarchical architecture to utilize the additional hate category information for more accurate prediction. Experimentally, we show that incorporating the hate category information for training can significantly improve the classification performance and our proposed model outperforms commonly-used discriminative models.
CLApr 11, 2018
Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social MediaMai ElSherief, Vivek Kulkarni, Dana Nguyen et al.
While social media empowers freedom of expression and individual voices, it also enables anti-social behavior, online harassment, cyberbullying, and hate speech. In this paper, we deepen our understanding of online hate speech by focusing on a largely neglected but crucial aspect of hate speech -- its target: either "directed" towards a specific person or entity, or "generalized" towards a group of people sharing a common protected characteristic. We perform the first linguistic and psycholinguistic analysis of these two forms of hate speech and reveal the presence of interesting markers that distinguish these types of hate speech. Our analysis reveals that Directed hate speech, in addition to being more personal and directed, is more informal, angrier, and often explicitly attacks the target (via name calling) with fewer analytic words and more words suggesting authority and influence. Generalized hate speech, on the other hand, is dominated by religious hate, is characterized by the use of lethal words such as murder, exterminate, and kill; and quantity words such as million and many. Altogether, our work provides a data-driven analysis of the nuances of online-hate speech that enables not only a deepened understanding of hate speech and its social implications but also its detection.