CYMar 27, 2023
(Un)fair devices: Moving beyond AI accuracy in personal sensingSofia Yfantidou, Marios Constantinides, Dimitris Spathis et al. · cambridge
Personal devices are omnipresent in our lives, seamlessly monitoring our activities, from smart rings tracking sleep patterns to smartwatches keeping an eye on missed heartbeats. The rich data streams from such devices fuel advanced Artificial Intelligence (AI) applications. Instead of solely relying on direct sensor measurements, these applications are increasingly leveraging Machine Learning (ML) model estimates to derive insights. But are these estimates biased or not? This literature review delivers compelling evidence about the impact of hidden biases that creep into ML models deployed on personal devices. We discuss critical bias issues drawn from prior work such as racial bias in pulse oximeters, weight bias in optical heart rate sensors, and sex bias in audio-based diagnostics. In response to these challenges, we advocate for a shift from prioritizing performance-oriented evaluations of personal devices to adopting assessments grounded in a human-centered approach. To facilitate this transition, we provide guidelines for the design, development, evaluation, and use of unbiased AI in personal devices, recognizing their potential impact on improving our health, lifestyle, and productivity -- more than any other technology.
CYMar 27, 2023
Uncovering Bias in Personal InformaticsSofia Yfantidou, Pavlos Sermpezis, Athena Vakali et al.
Personal informatics (PI) systems, powered by smartphones and wearables, enable people to lead healthier lifestyles by providing meaningful and actionable insights that break down barriers between users and their health information. Today, such systems are used by billions of users for monitoring not only physical activity and sleep but also vital signs and women's and heart health, among others. Despite their widespread usage, the processing of sensitive PI data may suffer from biases, which may entail practical and ethical implications. In this work, we present the first comprehensive empirical and analytical study of bias in PI systems, including biases in raw data and in the entire machine learning life cycle. We use the most detailed framework to date for exploring the different sources of bias and find that biases exist both in the data generation and the model learning and implementation streams. According to our results, the most affected minority groups are users with health issues, such as diabetes, joint issues, and hypertension, and female users, whose data biases are propagated or even amplified by learning models, while intersectional biases can also be observed.
AIDec 30, 2022
UBIWEAR: An end-to-end, data-driven framework for intelligent physical activity prediction to empower mHealth interventionsAsterios Bampakis, Sofia Yfantidou, Athena Vakali
It is indisputable that physical activity is vital for an individual's health and wellness. However, a global prevalence of physical inactivity has induced significant personal and socioeconomic implications. In recent years, a significant amount of work has showcased the capabilities of self-tracking technology to create positive health behavior change. This work is motivated by the potential of personalized and adaptive goal-setting techniques in encouraging physical activity via self-tracking. To this end, we propose UBIWEAR, an end-to-end framework for intelligent physical activity prediction, with the ultimate goal to empower data-driven goal-setting interventions. To achieve this, we experiment with numerous machine learning and deep learning paradigms as a robust benchmark for physical activity prediction tasks. To train our models, we utilize, "MyHeart Counts", an open, large-scale dataset collected in-the-wild from thousands of users. We also propose a prescriptive framework for self-tracking aggregated data preprocessing, to facilitate data wrangling of real-world, noisy data. Our best model achieves a MAE of 1087 steps, 65% lower than the state of the art in terms of absolute error, proving the feasibility of the physical activity prediction task, and paving the way for future research.
CLNov 4, 2022
Multilingual Name Entity Recognition and Intent Classification Employing Deep Learning ArchitecturesSofia Rizou, Antonia Paflioti, Angelos Theofilatos et al.
Named Entity Recognition and Intent Classification are among the most important subfields of the field of Natural Language Processing. Recent research has lead to the development of faster, more sophisticated and efficient models to tackle the problems posed by those two tasks. In this work we explore the effectiveness of two separate families of Deep Learning networks for those tasks: Bidirectional Long Short-Term networks and Transformer-based networks. The models were trained and tested on the ATIS benchmark dataset for both English and Greek languages. The purpose of this paper is to present a comparative study of the two groups of networks for both languages and showcase the results of our experiments. The models, being the current state-of-the-art, yielded impressive results and achieved high performance.
AIJul 30, 2024
Rolling in the deep of cognitive and AI biasesNicoleta Tantalaki, Athena Vakali
Nowadays, we delegate many of our decisions to Artificial Intelligence (AI) that acts either in solo or as a human companion in decisions made to support several sensitive domains, like healthcare, financial services and law enforcement. AI systems, even carefully designed to be fair, are heavily criticized for delivering misjudged and discriminated outcomes against individuals and groups. Numerous work on AI algorithmic fairness is devoted on Machine Learning pipelines which address biases and quantify fairness under a pure computational view. However, the continuous unfair and unjust AI outcomes, indicate that there is urgent need to understand AI as a sociotechnical system, inseparable from the conditions in which it is designed, developed and deployed. Although, the synergy of humans and machines seems imperative to make AI work, the significant impact of human and societal factors on AI bias is currently overlooked. We address this critical issue by following a radical new methodology under which human cognitive biases become core entities in our AI fairness overview. Inspired by the cognitive science definition and taxonomy of human heuristics, we identify how harmful human actions influence the overall AI lifecycle, and reveal human to AI biases hidden pathways. We introduce a new mapping, which justifies the human heuristics to AI biases reflections and we detect relevant fairness intensities and inter-dependencies. We envision that this approach will contribute in revisiting AI fairness under deeper human-centric case studies, revealing hidden biases cause and effects.
LGJan 3, 2024
Evaluating Fairness in Self-supervised and Supervised Models for Sequential DataSofia Yfantidou, Dimitris Spathis, Marios Constantinides et al. · cambridge
Self-supervised learning (SSL) has become the de facto training paradigm of large models where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Hypothesizing that SSL models would learn more generic, hence less biased, representations, this study explores the impact of pre-training and fine-tuning strategies on fairness (i.e., performing equally on different demographic breakdowns). Motivated by human-centric applications on real-world timeseries data, we interpret inductive biases on the model, layer, and metric levels by systematically comparing SSL models to their supervised counterparts. Our findings demonstrate that SSL has the capacity to achieve performance on par with supervised methods while significantly enhancing fairness--exhibiting up to a 27% increase in fairness with a mere 1% loss in performance through self-supervision. Ultimately, this work underscores SSL's potential in human-centric computing, particularly high-stakes, data-scarce application domains like healthcare.
CYOct 17, 2024
Towards Hybrid Intelligence in Journalism: Findings and Lessons Learnt from a Collaborative Analysis of Greek Political Rhetoric by ChatGPT and HumansThanasis Troboukis, Kelly Kiki, Antonis Galanopoulos et al.
This chapter introduces a research project titled "Analyzing the Political Discourse: A Collaboration Between Humans and Artificial Intelligence", which was initiated in preparation for Greece's 2023 general elections. The project focused on the analysis of political leaders' campaign speeches, employing Artificial Intelligence (AI), in conjunction with an interdisciplinary team comprising journalists, a political scientist, and data scientists. The chapter delves into various aspects of political discourse analysis, including sentiment analysis, polarization, populism, topic detection, and Named Entities Recognition (NER). This experimental study investigates the capabilities of large language model (LLMs), and in particular OpenAI's ChatGPT, for analyzing political speech, evaluates its strengths and weaknesses, and highlights the essential role of human oversight in using AI in journalism projects and potentially other societal sectors. The project stands as an innovative example of human-AI collaboration (known also as "hybrid intelligence") within the realm of digital humanities, offering valuable insights for future initiatives.
CYNov 27, 2025
Echoes of AI Harms: A Human-LLM Synergistic Framework for Bias-Driven Harm AnticipationNicoleta Tantalaki, Sophia Vei, Athena Vakali
The growing influence of Artificial Intelligence (AI) systems on decision-making in critical domains has exposed their potential to cause significant harms, often rooted in biases embedded across the AI lifecycle. While existing frameworks and taxonomies document bias or harms in isolation, they rarely establish systematic links between specific bias types and the harms they cause, particularly within real-world sociotechnical contexts. Technical fixes proposed to address AI biases are ill-equipped to address them and are typically applied after a system has been developed or deployed, offering limited preventive value. We propose ECHO, a novel framework for proactive AI harm anticipation through the systematic mapping of AI bias types to harm outcomes across diverse stakeholder and domain contexts. ECHO follows a modular workflow encompassing stakeholder identification, vignette-based presentation of biased AI systems, and dual (human-LLM) harm annotation, integrated within ethical matrices for structured interpretation. This human-centered approach enables early-stage detection of bias-to-harm pathways, guiding AI design and governance decisions from the outset. We validate ECHO in two high-stakes domains (disease diagnosis and hiring), revealing domain-specific, bias-to-harm patterns and demonstrating ECHO's potential to support anticipatory governance of AI systems
NISep 23, 2025
Poster: ChatIYP: Enabling Natural Language Access to the Internet Yellow Pages DatabaseVasilis Andritsoudis, Pavlos Sermpezis, Ilias Dimitriadis et al.
The Internet Yellow Pages (IYP) aggregates information from multiple sources about Internet routing into a unified, graph-based knowledge base. However, querying it requires knowledge of the Cypher language and the exact IYP schema, thus limiting usability for non-experts. In this paper, we propose ChatIYP, a domain-specific Retrieval-Augmented Generation (RAG) system that enables users to query IYP through natural language questions. Our evaluation demonstrates solid performance on simple queries, as well as directions for improvement, and provides insights for selecting evaluation metrics that are better fit for IYP querying AI agents.
AISep 12, 2025
AI Harmonics: a human-centric and harms severity-adaptive AI risk assessment frameworkSofia Vei, Paolo Giudici, Pavlos Sermpezis et al.
The absolute dominance of Artificial Intelligence (AI) introduces unprecedented societal harms and risks. Existing AI risk assessment models focus on internal compliance, often neglecting diverse stakeholder perspectives and real-world consequences. We propose a paradigm shift to a human-centric, harm-severity adaptive approach grounded in empirical incident data. We present AI Harmonics, which includes a novel AI harm assessment metric (AIH) that leverages ordinal severity data to capture relative impact without requiring precise numerical estimates. AI Harmonics combines a robust, generalized methodology with a data-driven, stakeholder-aware framework for exploring and prioritizing AI harms. Experiments on annotated incident data confirm that political and physical harms exhibit the highest concentration and thus warrant urgent mitigation: political harms erode public trust, while physical harms pose serious, even life-threatening risks, underscoring the real-world relevance of our approach. Finally, we demonstrate that AI Harmonics consistently identifies uneven harm distributions, enabling policymakers and organizations to target their mitigation efforts effectively.
LGJun 13, 2025
Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AIEva Paraschou, Ioannis Arapakis, Sofia Yfantidou et al.
Artificial Intelligence (AI) is rapidly embedded in critical decision-making systems, however their foundational ``black-box'' models require eXplainable AI (XAI) solutions to enhance transparency, which are mostly oriented to experts, making no sense to non-experts. Alarming evidence about AI's unprecedented human values risks brings forward the imperative need for transparent human-centered XAI solutions. In this work, we introduce a domain-, model-, explanation-agnostic, generalizable and reproducible framework that ensures both transparency and human-centered explanations tailored to the needs of both experts and non-experts. The framework leverages Large Language Models (LLMs) and employs in-context learning to convey domain- and explainability-relevant contextual knowledge into LLMs. Through its structured prompt and system setting, our framework encapsulates in one response explanations understandable by non-experts and technical information to experts, all grounded in domain and explainability principles. To demonstrate the effectiveness of our framework, we establish a ground-truth contextual ``thesaurus'' through a rigorous benchmarking with over 40 data, model, and XAI combinations for an explainable clustering analysis of a well-being scenario. Through a comprehensive quality and human-friendliness evaluation of our framework's explanations, we prove high content quality through strong correlations with ground-truth explanations (Spearman rank correlation=0.92) and improved interpretability and human-friendliness to non-experts through a user study (N=56). Our overall evaluation confirms trust in LLMs as HCXAI enablers, as our framework bridges the above Gaps by delivering (i) high-quality technical explanations aligned with foundational XAI methods and (ii) clear, efficient, and interpretable human-centered explanations for non-experts.
CYJun 10, 2025
FAIRTOPIA: Envisioning Multi-Agent Guardianship for Disrupting Unfair AI PipelinesAthena Vakali, Ilias Dimitriadis
AI models have become active decision makers, often acting without human supervision. The rapid advancement of AI technology has already caused harmful incidents that have hurt individuals and societies and AI unfairness in heavily criticized. It is urgent to disrupt AI pipelines which largely neglect human principles and focus on computational biases exploration at the data (pre), model(in), and deployment (post) processing stages. We claim that by exploiting the advances of agents technology, we will introduce cautious, prompt, and ongoing fairness watch schemes, under realistic, systematic, and human-centric fairness expectations. We envision agents as fairness guardians, since agents learn from their environment, adapt to new information, and solve complex problems by interacting with external tools and other systems. To set the proper fairness guardrails in the overall AI pipeline, we introduce a fairness-by-design approach which embeds multi-role agents in an end-to-end (human to AI) synergetic scheme. Our position is that we may design adaptive and realistic AI fairness frameworks, and we introduce a generalized algorithm which can be customized to the requirements and goals of each AI decision making scenario. Our proposed, so called FAIRTOPIA framework, is structured over a three-layered architecture, which encapsulates the AI pipeline inside an agentic guardian and a knowledge-based, self-refining layered scheme. Based on our proposition, we enact fairness watch in all of the AI pipeline stages, under robust multi-agent workflows, which will inspire new fairness research hypothesis, heuristics, and methods grounded in human-centric, systematic, interdisciplinary, socio-technical principles.
CLJan 9, 2025
AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AIPavlos Sermpezis, Stelios Karamanidis, Eva Paraschou et al.
Political discourse datasets are important for gaining political insights, analyzing communication strategies or social science phenomena. Although numerous political discourse corpora exist, comprehensive, high-quality, annotated datasets are scarce. This is largely due to the substantial manual effort, multidisciplinarity, and expertise required for the nuanced annotation of rhetorical strategies and ideological contexts. In this paper, we present AgoraSpeech, a meticulously curated, high-quality dataset of 171 political speeches from six parties during the Greek national elections in 2023. The dataset includes annotations (per paragraph) for six natural language processing (NLP) tasks: text classification, topic identification, sentiment analysis, named entity recognition, polarization and populism detection. A two-step annotation was employed, starting with ChatGPT-generated annotations and followed by exhaustive human-in-the-loop validation. The dataset was initially used in a case study to provide insights during the pre-election period. However, it has general applicability by serving as a rich source of information for political and social scientists, journalists, or data scientists, while it can be used for benchmarking and fine-tuning NLP and large language models (LLMs).
LGJun 4, 2024
Using Self-supervised Learning Can Improve Model FairnessSofia Yfantidou, Dimitris Spathis, Marios Constantinides et al.
Self-supervised learning (SSL) has become the de facto training paradigm of large models, where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Despite demonstrating comparable performance with supervised methods, comprehensive efforts to assess SSL's impact on machine learning fairness (i.e., performing equally on different demographic breakdowns) are lacking. Hypothesizing that SSL models would learn more generic, hence less biased representations, this study explores the impact of pre-training and fine-tuning strategies on fairness. We introduce a fairness assessment framework for SSL, comprising five stages: defining dataset requirements, pre-training, fine-tuning with gradual unfreezing, assessing representation similarity conditioned on demographics, and establishing domain-specific evaluation processes. We evaluate our method's generalizability on three real-world human-centric datasets (i.e., MIMIC, MESA, and GLOBEM) by systematically comparing hundreds of SSL and fine-tuned models on various dimensions spanning from the intermediate representations to appropriate evaluation metrics. Our findings demonstrate that SSL can significantly improve model fairness, while maintaining performance on par with supervised methods-exhibiting up to a 30% increase in fairness with minimal loss in performance through self-supervision. We posit that such differences can be attributed to representation dissimilarities found between the best- and the worst-performing demographics across models-up to x13 greater for protected attributes with larger performance discrepancies between segments.
LGSep 6, 2021
Pointspectrum: Equivariance Meets Laplacian Filtering for Graph Representation LearningMarinos Poiitis, Pavlos Sermpezis, Athena Vakali
Graph Representation Learning (GRL) has become essential for modern graph data mining and learning tasks. GRL aims to capture the graph's structural information and exploit it in combination with node and edge attributes to compute low-dimensional representations. While Graph Neural Networks (GNNs) have been used in state-of-the-art GRL architectures, they have been shown to suffer from over smoothing when many GNN layers need to be stacked. In a different GRL approach, spectral methods based on graph filtering have emerged addressing over smoothing; however, up to now, they employ traditional neural networks that cannot efficiently exploit the structure of graph data. Motivated by this, we propose PointSpectrum, a spectral method that incorporates a set equivariant network to account for a graph's structure. PointSpectrum enhances the efficiency and expressiveness of spectral methods, while it outperforms or competes with state-of-the-art GRL methods. Overall, PointSpectrum addresses over smoothing by employing a graph filter and captures a graph's structure through set equivariance, lying on the intersection of GNNs and spectral methods. Our findings are promising for the benefits and applicability of this architectural shift for spectral methods and GRL.
HCApr 23, 2021
14 Years of Self-Tracking Technology for mHealth -- Literature Review: Lessons Learnt and the PAST SELF FrameworkSofia Yfantidou, Pavlos Sermpezis, Athena Vakali
In today's connected society, many people rely on mHealth and self-tracking (ST) technology to help them adopt healthier habits with a focus on breaking their sedentary lifestyle and staying fit. However, there is scarce evidence of such technological interventions' effectiveness, and there are no standardized methods to evaluate their impact on people's physical activity (PA) and health. This work aims to help ST practitioners and researchers by empowering them with systematic guidelines and a framework for designing and evaluating technological interventions to facilitate health behavior change (HBC) and user engagement (UE), focusing on increasing PA and decreasing sedentariness. To this end, we conduct a literature review of 129 papers between 2008 and 2022, which identifies the core ST HCI design methods and their efficacy, as well as the most comprehensive list to date of UE evaluation metrics for ST. Based on the review's findings, we propose PAST SELF, a framework to guide the design and evaluation of ST technology that has potential applications in industrial and scientific settings. Finally, to facilitate researchers and practitioners, we complement this paper with an open corpus and an online, adaptive exploration tool for the PAST SELF data.
SISep 22, 2020
My tweets bring all the traits to the yard: Predicting personality and relational traits in Online Social NetworksDimitra Karanatsiou, Pavlos Sermpezis, Jon Gruda et al.
Users in Online Social Networks (OSN) leaves traces that reflect their personality characteristics. The study of these traces is important for a number of fields, such as a social science, psychology, OSN, marketing, and others. Despite a marked increase on research in personality prediction on based on online behavior the focus has been heavily on individual personality traits largely neglecting relational facets of personality. This study aims to address this gap by providing a prediction model for a holistic personality profiling in OSNs that included socio-relational traits (attachment orientations) in combination with standard personality traits. Specifically, we first designed a feature engineering methodology that extracts a wide range of features (accounting for behavior, language, and emotions) from OSN accounts of users. Then, we designed a machine learning model that predicts scores for the psychological traits of the users based on the extracted features. The proposed model architecture is inspired by characteristics embedded in psychological theory, i.e, utilizing interrelations among personality facets, and leads to increased accuracy in comparison with the state of the art approaches. To demonstrate the usefulness of this approach, we applied our model to two datasets, one of random OSN users and one of organizational leaders, and compared their psychological profiles. Our findings demonstrate that the two groups can be clearly separated by only using their psychological profiles, which opens a promising direction for future research on OSN user characterization and classification.
SIJul 20, 2019
Detecting Cyberbullying and Cyberaggression in Social MediaDespoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn et al.
Cyberbullying and cyberaggression are increasingly worrisome phenomena affecting people across all demographics. More than half of young social media users worldwide have been exposed to such prolonged and/or coordinated digital harassment. Victims can experience a wide range of emotions, with negative consequences such as embarrassment, depression, isolation from other community members, which embed the risk to lead to even more critical consequences, such as suicide attempts. In this work, we take the first concrete steps to understand the characteristics of abusive behavior in Twitter, one of today's largest social media platforms. We analyze 1.2 million users and 2.1 million tweets, comparing users participating in discussions around seemingly normal topics like the NBA, to those more likely to be hate-related, such as the Gamergate controversy, or the gender pay inequality at the BBC station. We also explore specific manifestations of abusive behavior, i.e., cyberbullying and cyberaggression, in one of the hate-related communities (Gamergate). We present a robust methodology to distinguish bullies and aggressors from normal Twitter users by considering text, user, and network-based attributes. Using various state-of-the-art machine learning algorithms, we classify these accounts with over 90% accuracy and AUC. Finally, we discuss the current status of Twitter user accounts marked as abusive by our methodology, and study the performance of potential mechanisms that can be used by Twitter to suspend users in the future.
CLFeb 1, 2018
A Unified Deep Learning Architecture for Abuse DetectionAntigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis et al.
Hate speech, offensive language, sexism, racism and other types of abusive behavior have become a common phenomenon in many online social media platforms. In recent years, such diverse abusive behaviors have been manifesting with increased frequency and levels of intensity. This is due to the openness and willingness of popular media platforms, such as Twitter and Facebook, to host content of sensitive or controversial topics. However, these platforms have not adequately addressed the problem of online abusive behavior, and their responsiveness to the effective detection and blocking of such inappropriate behavior remains limited. In the present paper, we study this complex problem by following a more holistic approach, which considers the various aspects of abusive behavior. To make the approach tangible, we focus on Twitter data and analyze user and textual properties from different angles of abusive posting behavior. We propose a deep learning architecture, which utilizes a wide variety of available metadata, and combines it with automatically-extracted hidden patterns within the text of the tweets, to detect multiple abusive behavioral norms which are highly inter-related. We apply this unified architecture in a seamless, transparent fashion to detect different types of abusive behavior (hate speech, sexism vs. racism, bullying, sarcasm, etc.) without the need for any tuning of the model architecture for each task. We test the proposed approach with multiple datasets addressing different and multiple abusive behaviors on Twitter. Our results demonstrate that it largely outperforms the state-of-art methods (between 21 and 45\% improvement in AUC, depending on the dataset).
SIMay 9, 2017
Hate is not Binary: Studying Abusive Behavior of #GamerGate on TwitterDespoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn et al.
Over the past few years, online bullying and aggression have become increasingly prominent, and manifested in many different forms on social media. However, there is little work analyzing the characteristics of abusive users and what distinguishes them from typical social media users. In this paper, we start addressing this gap by analyzing tweets containing a great large amount of abusiveness. We focus on a Twitter dataset revolving around the Gamergate controversy, which led to many incidents of cyberbullying and cyberaggression on various gaming and social media platforms. We study the properties of the users tweeting about Gamergate, the content they post, and the differences in their behavior compared to typical Twitter users. We find that while their tweets are often seemingly about aggressive and hateful subjects, "Gamergaters" do not exhibit common expressions of online anger, and in fact primarily differ from typical users in that their tweets are less joyful. They are also more engaged than typical Twitter users, which is an indication as to how and why this controversy is still ongoing. Surprisingly, we find that Gamergaters are less likely to be suspended by Twitter, thus we analyze their properties to identify differences from typical users and what may have led to their suspension. We perform an unsupervised machine learning analysis to detect clusters of users who, though currently active, could be considered for suspension since they exhibit similar behaviors with suspended users. Finally, we confirm the usefulness of our analyzed features by emulating the Twitter suspension mechanism with a supervised learning method, achieving very good precision and recall.
SIFeb 24, 2017
Measuring #GamerGate: A Tale of Hate, Sexism, and BullyingDespoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn et al.
Over the past few years, online aggression and abusive behaviors have occurred in many different forms and on a variety of platforms. In extreme cases, these incidents have evolved into hate, discrimination, and bullying, and even materialized into real-world threats and attacks against individuals or groups. In this paper, we study the Gamergate controversy. Started in August 2014 in the online gaming world, it quickly spread across various social networking platforms, ultimately leading to many incidents of cyberbullying and cyberaggression. We focus on Twitter, presenting a measurement study of a dataset of 340k unique users and 1.6M tweets to study the properties of these users, the content they post, and how they differ from random Twitter users. We find that users involved in this "Twitter war" tend to have more friends and followers, are generally more engaged and post tweets with negative sentiment, less joy, and more hate than random users. We also perform preliminary measurements on how the Twitter suspension mechanism deals with such abusive behaviors. While we focus on Gamergate, our methodology to collect and analyze tweets related to aggressive and bullying activities is of independent interest.