Sotiris Ioannidis

SI
h-index87
22papers
176citations
Novelty37%
AI Score51

22 Papers

SPJun 4, 2018
A New Wireless Communication Paradigm through Software-controlled Metasurfaces

Christos Liaskos, Shuai Nie, Ageliki Tsioliaridou et al.

Electromagnetic waves undergo multiple uncontrollable alterations as they propagate within a wireless environment. Free space path loss, signal absorption, as well as reflections, refractions and diffractions caused by physical objects within the environment highly affect the performance of wireless communications. Currently, such effects are intractable to account for and are treated as probabilistic factors. The paper proposes a radically different approach, enabling deterministic, programmable control over the behavior of the wireless environments. The key-enabler is the so-called HyperSurface tile, a novel class of planar meta-materials which can interact with impinging electromagnetic waves in a controlled manner. The HyperSurface tiles can effectively re-engineer electromagnetic waves, including steering towards any desired direction, full absorption, polarization manipulation and more. Multiple tiles are employed to coat objects such as walls, furniture, overall, any objects in the indoor and outdoor environments. An external software service calculates and deploys the optimal interaction types per tile, to best fit the needs of communicating devices. Evaluation via simulations highlights the potential of the new concept.

NIJun 4, 2018
Using any Surface to Realize a New Paradigm for Wireless Communications

Christos Liaskos, Ageliki Tsioliaridou, Andreas Pitsillides et al.

This article introduces an approach that could tame wireless channels, making their behavior deterministic and software-defined. We investigate the novel idea of HyperSurfaces, which are software-controlled metamaterials embedded in any surface in the environment. HyperSurfaces are materials that interact with electromagnetic waves in a fully software-defined fashion, even unnaturally. Coating walls, doors, furniture and other objects with HyperSurfaces constitutes the overall behavior of an indoor wireless environment programmable. Thus, the electromagnetic behavior of the environment as a whole can be controlled and tailored to the needs of mobile devices within it.

ETMay 17, 2018
Realizing Wireless Communication through Software-defined HyperSurface Environments

Christos Liaskos, Shuai Nie, Ageliki Tsioliaridou et al.

Wireless communication environments are unaware of the ongoing data exchange efforts within them. Moreover, their effect on the communication quality is intractable in all but the simplest cases. The present work proposes a new paradigm, where indoor scattering becomes software-defined and, subsequently, optimizable across wide frequency ranges. Moreover, the controlled scattering can surpass natural behavior, exemplary overriding Snell's law, reflecting waves towards any custom angle (including negative ones). Thus, path loss and multi-path fading effects can be controlled and mitigated. The core technology of this new paradigm are metasurfaces, planar artificial structures whose effect on impinging electromagnetic waves is fully defined by their macro-structure. The present study contributes the software-programmable wireless environment model, consisting of several HyperSurface tiles controlled by a central, environment configuration server. HyperSurfaces are a novel class of metasurfaces whose structure and, hence, electromagnetic behavior can be altered and controlled via a software interface. Multiple networked tiles coat indoor objects, allowing fine-grained, customizable reflection, absorption or polarization overall. A central server calculates and deploys the optimal electromagnetic interaction per tile, to the benefit of communicating devices. Realistic simulations using full 3D ray-tracing demonstrate the groundbreaking potential of the proposed approach in 2.4 GHz and 60 GHz frequencies.

SIApr 7, 2022
Twitter Dataset on the Russo-Ukrainian War

Alexander Shevtsov, Christos Tzagkarakis, Despoina Antonakaki et al.

On 24 February 2022, Russia invaded Ukraine, also known now as Russo-Ukrainian War. We have initiated an ongoing dataset acquisition from Twitter API. Until the day this paper was written the dataset has reached the amount of 57.3 million tweets, originating from 7.7 million users. We apply an initial volume and sentiment analysis, while the dataset can be used to further exploratory investigation towards topic analysis, hate speech, propaganda recognition, or even show potential malicious entities like botnets.

SPNov 4, 2025
RIS-Assisted 3D Spherical Splatting for Object Composition Visualization using Detection Transformers

Anastasios T. Sotiropoulos, Stavros Tsimpoukis, Dimitrios Tyrovolas et al.

The pursuit of immersive and structurally aware multimedia experiences has intensified interest in sensing modalities that reconstruct objects beyond the limits of visible light. Conventional optical pipelines degrade under occlusion or low illumination, motivating the use of radio-frequency (RF) sensing, whose electromagnetic waves penetrate materials and encode both geometric and compositional information. Yet, uncontrolled multipath propagation restricts reconstruction accuracy. Recent advances in Programmable Wireless Environments (PWEs) mitigate this limitation by enabling software-defined manipulation of propagation through Reconfigurable Intelligent Surfaces (RISs), thereby providing controllable illumination diversity. Building on this capability, this work introduces a PWE-driven RF framework for three-dimensional object reconstruction using material-aware spherical primitives. The proposed approach combines RIS-enabled field synthesis with a Detection Transformer (DETR) that infers spatial and material parameters directly from extracted RF features. Simulation results confirm the framework's ability to approximate object geometries and classify material composition with an overall accuracy of 79.35%, marking an initial step toward programmable and physically grounded RF-based 3D object composition visualization.

SIJun 6, 2023
Russo-Ukrainian War: Prediction and explanation of Twitter suspension

Alexander Shevtsov, Despoina Antonakaki, Ioannis Lamprou et al.

On 24 February 2022, Russia invaded Ukraine, starting what is now known as the Russo-Ukrainian War, initiating an online discourse on social media. Twitter as one of the most popular SNs, with an open and democratic character, enables a transparent discussion among its large user base. Unfortunately, this often leads to Twitter's policy violations, propaganda, abusive actions, civil integrity violation, and consequently to user accounts' suspension and deletion. This study focuses on the Twitter suspension mechanism and the analysis of shared content and features of the user accounts that may lead to this. Toward this goal, we have obtained a dataset containing 107.7M tweets, originating from 9.8 million users, using Twitter API. We extract the categories of shared content of the suspended accounts and explain their characteristics, through the extraction of text embeddings in junction with cosine similarity clustering. Our results reveal scam campaigns taking advantage of trending topics regarding the Russia-Ukrainian conflict for Bitcoin and Ethereum fraud, spam, and advertisement campaigns. Additionally, we apply a machine learning methodology including a SHapley Additive explainability model to understand and explain how user accounts get suspended.

LGJun 15, 2022
Evaluating Short-Term Forecasting of Multiple Time Series in IoT Environments

Christos Tzagkarakis, Pavlos Charalampidis, Stylianos Roubakis et al.

Modern Internet of Things (IoT) environments are monitored via a large number of IoT enabled sensing devices, with the data acquisition and processing infrastructure setting restrictions in terms of computational power and energy resources. To alleviate this issue, sensors are often configured to operate at relatively low sampling frequencies, yielding a reduced set of observations. Nevertheless, this can hamper dramatically subsequent decision-making, such as forecasting. To address this problem, in this work we evaluate short-term forecasting in highly underdetermined cases, i.e., the number of sensor streams is much higher than the number of observations. Several statistical, machine learning and neural network-based models are thoroughly examined with respect to the resulting forecasting accuracy on five different real-world datasets. The focus is given on a unified experimental protocol especially designed for short-term prediction of multiple time series at the IoT edge. The proposed framework can be considered as an important step towards establishing a solid forecasting strategy in resource constrained IoT applications.

SIMay 20
Detecting Synthetic Political Narratives in Cross-Platform Social Media Discourse

Despoina Antonakaki, Sotiris Ioannidis

The proliferation of large language models has introduced a new paradigm of synthetic political communication in which narratives may be generated, semantically coordinated, and strategically disseminated across platforms at scale. We present a cross-platform framework for detecting synthetic political narratives using four coordination signals -- lexical diversity D(C), temporal burstiness B(C), rhetorical repetition R(C), and semantic homogenization H(C) -- combined into a Synthetic Narrative Coordination Score SNC(C). We apply the framework to a corpus of 353,223 records spanning six geopolitical event windows collected from six Telegram channels and nine Reddit communities (2023--2026). Results show that IntelSlava exhibits the lowest lexical diversity (MATTR 0.52--0.54), the highest burstiness (B=+0.48 to +0.73), and the highest rhetorical overlap with peer channels (Jaccard 0.12), ranking first in the composite SNC(C) on four of six event windows (SNC 0.45--0.60). Rybar ranks last on all windows despite its high semantic homogenization, because its Russian-language output yields high lexical diversity and near-zero rhetorical Jaccard with English-language channels -- demonstrating that no single indicator is sufficient for coordination detection. Multi-dimensional SNC(C) scoring provides a more robust and interpretable signal than any individual metric.

CYApr 16
From Parliamentary Rhetoric to Enacted Law: An NLP Pipeline for Semantic Auditing of the Greek Legislative Process

Despoina Antonakaki, Sotiris Ioannidis

The Greek legislative framework is characterized by intricate cross-referencing, frequent amendments, and limited machine-readable access, hindering transparency and civic engagement. Traditional bulk-archiving approaches are computationally expensive and fail to capture political relevance. We present a multimodal computational pipeline that bridges parliamentary discourse with enacted legislation. Applying Natural Language Processing (NLP) to 2025 Hellenic Parliament transcripts, we extracted 534 unique law citations and used debate frequency as an empirical signal to identify politically salient laws. A headless browser architecture enables automated acquisition of official Government Gazette documents despite anti-scraping barriers. Using Large Language Models (LLMs), we conduct a semantic audit of legislative quality. Our analysis reveals an "Illusion of Simplicity", where laws framed as simplifications exhibit high structural complexity and ambiguity. A typology of 312 ambiguity instances shows that 45 percent stem from vague terminology and 25 percent from deferred executive delegation. We introduce the Political Discrepancy Index (PDI), evaluating alignment between ministerial promises and enacted law. Across three high-frequency laws (4808/2021, 4412/2016, 4662/2020), the dominant outcome is Deferral, with commitments shifted to future Ministerial Decisions. Cross-reference network analysis confirms a highly entangled legal system, with foundational provisions among the most frequently amended. The pipeline produces a semantically linked dataset and an interactive auditing interface for scalable analysis of legislative processes.

SIMar 3
Cross-Platform Digital Discourse Analysis of Iran: Topics, Sentiment, Polarization, and Event Validation on Telegram and Reddit

Despoina Antonakaki, Sotiris Ioannidis

We analyze Iran-related discourse across two structurally different platforms: Telegram (7,567 messages from international news channels) and Reddit (23,909 posts and comments from Iran-focused and global communities). Using a single reproducible pipeline, we apply NMF topic modeling over TF--IDF features, VADER sentiment scoring, and a keyword-bundle escalation index capturing military, nuclear, and diplomatic narratives. To assess whether discourse dynamics track offline developments, we compare escalation time series with external protest and geopolitical event timelines using same-day and lagged correlation analysis. Same-day correlations are weak, but the strongest relationships occur at non-zero lags, consistent with anticipatory or reactive framing rather than instantaneous mirroring. Finally, using a separate real-time collection (February 2026), we observe synchronized increases in escalation-related narratives that coincide with documented geopolitical developments. Overall, the results show systematic cross-platform differences in narrative structure and tone, and provide quantitative evidence that online escalation signals can align with real-world developments with measurable temporal offsets.

CYNov 27, 2025
Cross-Platform Digital Discourse Analysis of the Israel-Hamas Conflict: Sentiment, Topics, and Event Dynamics

Despoina Antonakaki, Sotiris Ioannidis

The Israeli-Palestinian conflict remains one of the most polarizing geopolitical issues, with the October 2023 escalation intensifying online debate. Social media platforms, particularly Telegram, have become central to real-time news sharing, advocacy, and propaganda. In this study, we analyze Telegram, Twitter/X, and Reddit to examine how conflict narratives are produced, amplified, and contested across different digital spheres. Building on our previous work on Telegram discourse during the 2023 escalation, we extend the analysis longitudinally and cross-platform using an updated dataset spanning October 2023 to mid-2025. The corpus includes more than 187,000 Telegram messages, 2.1 million Reddit comments, and curated Twitter/X posts. We combine Latent Dirichlet Allocation (LDA), BERTopic, and transformer-based sentiment and emotion models to identify dominant themes, emotional dynamics, and propaganda strategies. Telegram channels provide unfiltered, high-intensity documentation of events; Twitter/X amplifies frames to global audiences; and Reddit hosts more reflective and deliberative discussions. Our findings reveal persistent negative sentiment, strong coupling between humanitarian framing and solidarity expressions, and platform-specific pathways for the diffusion of pro-Palestinian and pro-Israeli narratives. This paper offers three contributions: (1) a multi-platform, FAIR-compliant dataset on the Israel-Hamas war, (2) an integrated pipeline combining topic modeling, sentiment and emotion analysis, and spam filtering for large-scale conflict discourse, and (3) empirical insights into how platform affordances and affective publics shape the evolution of digital conflict communication.

CRJul 11, 2025
White-Basilisk: A Hybrid Model for Code Vulnerability Detection

Ioannis Lamprou, Alexander Shevtsov, Ioannis Arapakis et al.

The proliferation of software vulnerabilities presents a significant challenge to cybersecurity, necessitating more effective detection methodologies. We introduce White-Basilisk, a novel approach to vulnerability detection that demonstrates superior performance while challenging prevailing assumptions in AI model scaling. Utilizing an innovative architecture that integrates Mamba layers, linear self-attention, and a Mixture of Experts framework, White-Basilisk achieves state-of-the-art results in vulnerability detection tasks with a parameter count of only 200M. The model's capacity to process sequences of unprecedented length enables comprehensive analysis of extensive codebases in a single pass, surpassing the context limitations of current Large Language Models (LLMs). White-Basilisk exhibits robust performance on imbalanced, real-world datasets, while maintaining computational efficiency that facilitates deployment across diverse organizational scales. This research not only establishes new benchmarks in code security but also provides empirical evidence that compact, efficiently designed models can outperform larger counterparts in specialized tasks, potentially redefining optimization strategies in AI development for domain-specific applications.

SIJan 30, 2025
Israel-Hamas war through Telegram, Reddit and Twitter

Despoina Antonakaki, Sotiris Ioannidis

The Israeli-Palestinian conflict started on 7 October 2023, have resulted thus far to over 48,000 people killed including more than 17,000 children with a majority from Gaza, more than 30,000 people injured, over 10,000 missing, and over 1 million people displaced, fleeing conflict zones. The infrastructure damage includes the 87\% of housing units, 80\% of public buildings and 60\% of cropland 17 out of 36 hospitals, 68\% of road networks and 87\% of school buildings damaged. This conflict has as well launched an online discussion across various social media platforms. Telegram was no exception due to its encrypted communication and highly involved audience. The current study will cover an analysis of the related discussion in relation to different participants of the conflict and sentiment represented in those discussion. To this end, we prepared a dataset of 125K messages shared on channels in Telegram spanning from 23 October 2025 until today. Additionally, we apply the same analysis in two publicly available datasets from Twitter containing 2001 tweets and from Reddit containing 2M opinions. We apply a volume analysis across the three datasets, entity extraction and then proceed to BERT topic analysis in order to extract common themes or topics. Next, we apply sentiment analysis to analyze the emotional tone of the discussions. Our findings hint at polarized narratives as the hallmark of how political factions and outsiders mold public opinion. We also analyze the sentiment-topic prevalence relationship, detailing the trends that may show manipulation and attempts of propaganda by the involved parties. This will give a better understanding of the online discourse on the Israel-Palestine conflict and contribute to the knowledge on the dynamics of social media communication during geopolitical crises.

NIFeb 8, 2024
LightningNet: Distributed Graph-based Cellular Network Performance Forecasting for the Edge

Konstantinos Zacharopoulos, Georgios Koutroumpas, Ioannis Arapakis et al.

The cellular network plays a pivotal role in providing Internet access, since it is the only global-scale infrastructure with ubiquitous mobility support. To manage and maintain large-scale networks, mobile network operators require timely information, or even accurate performance forecasts. In this paper, we propose LightningNet, a lightweight and distributed graph-based framework for forecasting cellular network performance, which can capture spatio-temporal dependencies that arise in the network traffic. LightningNet achieves a steady performance increase over state-of-the-art forecasting techniques, while maintaining a similar resource usage profile. Our architecture ideology also excels in the respect that it is specifically designed to support IoT and edge devices, giving us an even greater step ahead of the current state-of-the-art, as indicated by our performance experiments with NVIDIA Jetson.

SIMay 31, 2023
BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline

Alexander Shevtsov, Despoina Antonakaki, Ioannis Lamprou et al.

Twitter, as one of the most popular social networks, provides a platform for communication and online discourse. Unfortunately, it has also become a target for bots and fake accounts, resulting in the spread of false information and manipulation. This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges associated with machine learning model development. Through this pipeline, we develop a comprehensive bot detection model named BotArtist, based on user profile features. SAMLP leverages nine distinct publicly available datasets to train the BotArtist model. To assess BotArtist's performance against current state-of-the-art solutions, we evaluate 35 existing Twitter bot detection methods, each utilizing a diverse range of features. Our comparative evaluation of BotArtist and these existing methods, conducted across nine public datasets under standardized conditions, reveals that the proposed model outperforms existing solutions by almost 10% in terms of F1-score, achieving an average score of 83.19% and 68.5% over specific and general approaches, respectively. As a result of this research, we provide one of the largest labeled Twitter bot datasets. The dataset contains extracted features combined with BotArtist predictions for 10,929,533 Twitter user profiles, collected via Twitter API during the 2022 Russo-Ukrainian War over a 16-month period. This dataset was created based on [Shevtsov et al., 2022a] where the original authors share anonymized tweets discussing the Russo-Ukrainian war, totaling 127,275,386 tweets. The combination of the existing textual dataset and the provided labeled bot and human profiles will enable future development of more advanced bot detection large language models in the post-Twitter API era.

SIDec 8, 2021
Identification of Twitter Bots Based on an Explainable Machine Learning Framework: The US 2020 Elections Case Study

Alexander Shevtsov, Christos Tzagkarakis, Despoina Antonakaki et al.

Twitter is one of the most popular social networks attracting millions of users, while a considerable proportion of online discourse is captured. It provides a simple usage framework with short messages and an efficient application programming interface (API) enabling the research community to study and analyze several aspects of this social network. However, the Twitter usage simplicity can lead to malicious handling by various bots. The malicious handling phenomenon expands in online discourse, especially during the electoral periods, where except the legitimate bots used for dissemination and communication purposes, the goal is to manipulate the public opinion and the electorate towards a certain direction, specific ideology, or political party. This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data. To this end, a supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm, where the hyper-parameters are tuned via cross-validation. Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions by calculating feature importance, using the game theoretic-based Shapley values. Experimental evaluation on distinct Twitter datasets demonstrate the superiority of our approach, in terms of bot detection accuracy, when compared against a recent state-of-the-art Twitter bot detection method.

CRJul 30, 2019
Clash of the Trackers: Measuring the Evolution of the Online Tracking Ecosystem

Konstantinos Solomos, Panagiotis Ilia, Sotiris Ioannidis et al.

Websites are constantly adapting the methods used, and intensity with which they track online visitors. However, the wide-range enforcement of GDPR since one year ago (May 2018) forced websites serving EU-based online visitors to eliminate or at least reduce such tracking activity, given they receive proper user consent. Therefore, it is important to record and analyze the evolution of this tracking activity and assess the overall "privacy health" of the Web ecosystem and if it is better after GDPR enforcement. This work makes a significant step towards this direction. In this paper, we analyze the online ecosystem of 3rd-parties embedded in top websites which amass the majority of online tracking through 6 time snapshots taken every few months apart, in the duration of the last 2 years. We perform this analysis in three ways: 1) by looking into the network activity that 3rd-parties impose on each publisher hosting them, 2) by constructing a bipartite graph of "publisher-to-tracker", connecting 3rd parties with their publishers, 3) by constructing a "tracker-to-tracker" graph connecting 3rd-parties who are commonly found in publishers. We record significant changes through time in number of trackers, traffic induced in publishers (incoming vs. outgoing), embeddedness of trackers in publishers, popularity and mixture of trackers across publishers. We also report how such measures compare with the ranking of publishers based on Alexa. On the last level of our analysis, we dig deeper and look into the connectivity of trackers with each other and how this relates to potential cookie synchronization activity.

ETMay 7, 2019
An Interpretable Neural Network for Configuring Programmable Wireless Environments

Christos Liaskos, Ageliki Tsioliaridou, Shuai Nie et al.

Software-defined metasurfaces (SDMs) comprise a dense topology of basic elements called meta-atoms, exerting the highest degree of control over surface currents among intelligent panel technologies. As such, they can transform impinging electromagnetic (EM) waves in complex ways, modifying their direction, power, frequency spectrum, polarity and phase. A well-defined software interface allows for applying such functionalities to waves and inter-networking SDMs, while abstracting the underlying physics. A network of SDMs deployed over objects within an area, such as a floorplan walls, creates programmable wireless environments (PWEs) with fully customizable propagation of waves within them. This work studies the use of machine learning for configuring such environments to the benefit of users within. The methodology consists of modeling wireless propagation as a custom, interpretable, back-propagating neural network, with SDM elements as nodes and their cross-interactions as links. Following a training period the network learns the propagation basics of SDMs and configures them to facilitate the communication of users within their vicinity.

ETApr 24, 2019
Joint Compressed Sensing and Manipulation of Wireless Emissions with Intelligent Surfaces

Christos Liaskos, Ageliki Tsioliaridou, Alexandros Pitilakis et al.

Programmable, intelligent surfaces can manipulate electromagnetic waves impinging upon them, producing arbitrarily shaped reflection, refraction and diffraction, to the benefit of wireless users. Moreover, in their recent form of HyperSurfaces, they have acquired inter-networking capabilities, enabling the Internet of Material Properties with immense potential in wireless communications. However, as with any system with inputs and outputs, accurate sensing of the impinging wave attributes is imperative for programming HyperSurfaces to obtain a required response. Related solutions include field nano-sensors embedded within HyperSurfaces to perform minute measurements over the area of the HyperSurface, as well as external sensing systems. The present work proposes a sensing system that can operate without such additional hardware. The novel scheme programs the HyperSurface to perform compressed sensing of the impinging wave via simple one-antenna power measurements. The HyperSurface can jointly be programmed for both wave sensing and wave manipulation duties at the same time. Evaluation via simulations validates the concept and highlight its promising potential.

CYJan 3, 2019
Please Forget Where I Was Last Summer: The Privacy Risks of Public Location (Meta)Data

Kostas Drakonakis, Panagiotis Ilia, Sotiris Ioannidis et al.

The exposure of location data constitutes a significant privacy risk to users as it can lead to de-anonymization, the inference of sensitive information, and even physical threats. In this paper we present LPAuditor, a tool that conducts a comprehensive evaluation of the privacy loss caused by publicly available location metadata. First, we demonstrate how our system can pinpoint users' key locations at an unprecedented granularity by identifying their actual postal addresses. Our experimental evaluation on Twitter data highlights the effectiveness of our techniques which outperform prior approaches by 18.9%-91.6% for homes and 8.7%-21.8% for workplaces. Next we present a novel exploration of automated private information inference that uncovers "sensitive" locations that users have visited (pertaining to health, religion, and sex/nightlife). We find that location metadata can provide additional context to tweets and thus lead to the exposure of private information that might not match the users' intentions. We further explore the mismatch between user actions and information exposure and find that older versions of the official Twitter apps follow a privacy-invasive policy of including precise GPS coordinates in the metadata of tweets that users have geotagged at a coarse-grained level (e.g., city). The implications of this exposure are further exacerbated by our finding that users are considerably privacy-cautious in regards to exposing precise location data. When users can explicitly select what location data is published, there is a 94.6% reduction in tweets with GPS coordinates. As part of current efforts to give users more control over their data, LPAuditor can be adopted by major services and offered as an auditing tool that informs users about sensitive information they (indirectly) expose through location metadata.

CRDec 29, 2018
Talon: An Automated Framework for Cross-Device Tracking Detection

Konstantinos Solomos, Panagiotis Ilia, Sotiris Ioannidis et al.

Although digital advertising fuels much of today's free Web, it typically does so at the cost of online users' privacy, due to the continuous tracking and leakage of users' personal data. In search for new ways to optimize the effectiveness of ads, advertisers have introduced new advanced paradigms such as cross-device tracking (CDT), to monitor users' browsing on multiple devices and screens, and deliver (re)targeted ads in the most appropriate screen.Unfortunately, this practice leads to greater privacy concerns for the end-user. Going beyond the state-of-the-art, we propose a novel methodology for detecting CDT and measuring the factors affecting its performance, in a repeatable and systematic way. This new methodology is based on emulating realistic browsing activity of end-users, from different devices, and thus triggering and detecting cross-device targeted ads. We design and build Talon a CDT measurement framework that implements our methodology and allows experimentation with multiple parallel devices, experimental setups and settings. By employing Talon, we perform several critical experiments, and we are able to not only detect and measure CDT with average AUC score of 0.78-0.96, but also to provide significant insights about the behavior of CDT entities and the impact on users' privacy. In the hands of privacy researchers, policy makers and end-users, Talon can be an invaluable tool for raising awareness and increasing transparency on tracking practices used by the ad-ecosystem.

CRSep 30, 2018
Master of Web Puppets: Abusing Web Browsers for Persistent and Stealthy Computation

Panagiotis Papadopoulos, Panagiotis Ilia, Michalis Polychronakis et al.

The proliferation of web applications has essentially transformed modern browsers into small but powerful operating systems. Upon visiting a website, user devices run implicitly trusted script code, the execution of which is confined within the browser to prevent any interference with the user's system. Recent JavaScript APIs, however, provide advanced capabilities that not only enable feature-rich web applications, but also allow attackers to perform malicious operations despite the confined nature of JavaScript code execution. In this paper, we demonstrate the powerful capabilities that modern browser APIs provide to attackers by presenting MarioNet: a framework that allows a remote malicious entity to control a visitor's browser and abuse its resources for unwanted computation or harmful operations, such as cryptocurrency mining, password-cracking, and DDoS. MarioNet relies solely on already available HTML5 APIs, without requiring the installation of any additional software. In contrast to previous browser-based botnets, the persistence and stealthiness characteristics of MarioNet allow the malicious computations to continue in the background of the browser even after the user closes the window or tab of the initial malicious website. We present the design, implementation, and evaluation of a prototype system, MarioNet, that is compatible with all major browsers, and discuss potential defense strategies to counter the threat of such persistent in-browser attacks. Our main goal is to raise awareness regarding this new class of attacks, and inform the design of future browser APIs so that they provide a more secure client-side environment for web applications.