NIJun 23, 2025
A Comprehensive Survey on Network Traffic Synthesis: From Statistical Models to Deep LearningNirhoshan Sivaroopan, Kaushitha Silva, Chamara Madarasingha et al.
Synthetic network traffic generation has emerged as a promising alternative for various data-driven applications in the networking domain. It enables the creation of synthetic data that preserves real-world characteristics while addressing key challenges such as data scarcity, privacy concerns, and purity constraints associated with real data. In this survey, we provide a comprehensive review of synthetic network traffic generation approaches, covering essential aspects such as data types, generation models, and evaluation methods. With the rapid advancements in AI and machine learning, we focus particularly on deep learning-based techniques while also providing a detailed discussion of statistical methods and their extensions, including commercially available tools. Furthermore, we highlight open challenges in this domain and discuss potential future directions for further research and development. This survey serves as a foundational resource for researchers and practitioners, offering a structured analysis of existing methods, challenges, and opportunities in synthetic network traffic generation.
CRJun 2, 2020
A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play StoreNaveen Karunanayake, Jathushan Rajasegaran, Ashanie Gunathillake et al.
Counterfeit apps impersonate existing popular apps in attempts to misguide users to install them for various reasons such as collecting personal information or spreading malware. Many counterfeits can be identified once installed, however even a tech-savvy user may struggle to detect them before installation. To this end, this paper proposes to leverage the recent advances in deep learning methods to create image and text embeddings so that counterfeit apps can be efficiently identified when they are submitted for publication. We show that a novel approach of combining content embeddings and style embeddings outperforms the baseline methods for image similarity such as SIFT, SURF, and various image hashing methods. We first evaluate the performance of the proposed method on two well-known datasets for evaluating image similarity methods and show that content, style, and combined embeddings increase precision@k and recall@k by 10%-15% and 12%-25%, respectively when retrieving five nearest neighbours. Second, specifically for the app counterfeit detection problem, combined content and style embeddings achieve 12% and 14% increase in precision@k and recall@k, respectively compared to the baseline methods. Third, we present an analysis of approximately 1.2 million apps from Google Play Store and identify a set of potential counterfeits for top-10,000 popular apps. Under a conservative assumption, we were able to find 2,040 potential counterfeits that contain malware in a set of 49,608 apps that showed high similarity to one of the top-10,000 popular apps in Google Play Store. We also find 1,565 potential counterfeits asking for at least five additional dangerous permissions than the original app and 1,407 potential counterfeits having at least five extra third party advertisement libraries.
PFMar 21, 2019
Impact of network delays on Hyperledger FabricThanh Son Lam Nguyen, Guillaume Jourjon, Maria Potop-Butucaru et al.
Blockchain has become one of the most attractive technologies for applications, with a large range of deployments such as production, economy, or banking. Under the hood, Blockchain technology is a type of distributed database that supports untrusted parties. In this paper we focus Hyperledger Fabric, the first blockchain in the market tailored for a private environment, allowing businesses to create a permissioned network. Hyperledger Fabric implements a PBFT consensus in order to maintain a non forking blockchain at the application level. We deployed this framework over an area network between France and Germany in order to evaluate its performance when potentially large network delays are observed. Overall we found that when network delay increases significantly (i.e. up to 3.5 seconds at network layer between two clouds), we observed that the blocks added to our blockchain had up to 134 seconds offset after 100 th block from one cloud to another. Thus by delaying block propagation, we demonstrated that Hyperledger Fabric does not provide sufficient consistency guaranties to be deployed in critical environments. Our work, is the fist to evidence the negative impact of network delays on a PBFT-based blockchain.
CRFeb 26, 2019
The Attack of the Clones Against Proof-of-AuthorityParinya Ekparinya, Vincent Gramoli, Guillaume Jourjon
In this paper, we explore vulnerabilities and countermeasures of the recently proposed blockchain consensus based on proof-of-authority. The proof-of-work blockchains, like Bitcoin and Ethereum, have been shown both theoretically and empirically vulnerable to double spending attacks. This is why Byzantine fault tolerant consensus algorithms have gained popularity in the blockchain context for their ability to tolerate a limited number t of attackers among n participants. We formalize the recently proposed proof-of-authority consensus algorithms that are Byzantine fault tolerant by describing the Aura and Clique protocols present in the two mainstream implementations of Ethereum. We then introduce the Cloning Attack and show how to apply it to double spend in each of these protocols with a single malicious node. Our results show that the Cloning Attack against Aura is always successful while the same attack against Clique is about twice as fast and succeeds in most cases.
CRMay 14, 2018
Double-Spending Risk Quantification in Private, Consortium and Public Ethereum BlockchainsParinya Ekparinya, Vincent Gramoli, Guillaume Jourjon
Recently, several works conjectured the vulnerabilities of mainstream blockchains under several network attacks. All these attacks translate into showing that the assumptions of these blockchains can be violated in theory or under simulation at best. Unfortunately, previous results typically omit both the nature of the network under which the blockchain code runs and whether blockchains are private, consortium or public. In this paper, we study the public Ethereum blockchain as well as a consortium and private blockchains and quantify the feasibility of man-in-the-middle and double spending attacks against them. To this end, we list important properties of the Ethereum public blockchain topology, we deploy VMs with constrained CPU quantum to mimic the top-10 mining pools of Ethereum and we develop full-fledged attacks, that first partition the network through BGP hijacking or ARP spoofing before issuing a Balance Attack to steal coins. Our results demonstrate that attacking Ethereum is remarkably devastating in a consortium or private context as the adversary can multiply her digital assets by 200, 000x in 10 hours through BGP hijacking whereas it would be almost impossible in a public context.
CRApr 26, 2018
A Neural Embeddings Approach for Detecting Mobile Counterfeit AppsJathushan Rajasegaran, Suranga Seneviratne, Guillaume Jourjon
Counterfeit apps impersonate existing popular apps in attempts to misguide users to install them for various reasons such as collecting personal information, spreading malware, or simply to increase their advertisement revenue. Many counterfeits can be identified once installed, however even a tech-savvy user may struggle to detect them before installation as app icons and descriptions can be quite similar to the original app. To this end, this paper proposes to use neural embeddings generated by state-of-the-art convolutional neural networks (CNNs) to measure the similarity between images. Our results show that for the problem of counterfeit detection a novel approach of using style embeddings given by the Gram matrix of CNN filter responses outperforms baseline methods such as content embeddings and SIFT features. We show that further performance increases can be achieved by combining style embeddings with content embeddings. We present an analysis of approximately 1.2 million apps from Google Play Store and identify a set of potential counterfeits for top-1,000 apps. Under a conservative assumption, we were able to find 139 apps that contain malware in a set of 6,880 apps that showed high visual similarity to one of the top-1,000 apps in Google Play Store.
CYJan 31, 2018
A Delay-Tolerant Payment Scheme Based on the Ethereum BlockchainYining Hu, Ahsan Manzoor, Parinya Ekparinya et al.
Banking as an essential service can be hard to access in remote, rural regions where the network connectivity is intermittent. Although micro-banking has been made possible by SMS or USSD messages in some places, their security flaws and session-based nature prevent them from a wider adoption. Global level cryptocurrencies enable low-cost, secure and pervasive money transferring among distributed peers, but are still limited in their ability to reach more people in remote communities. We proposed to take advantage of the delay-tolerant nature of blockchains to deliver banking services to remote communities that only connect to the broader Internet intermittently. Using a base station that offers connectivity within the local area, regular transaction processing is solely handled by blockchain miners. The bank only joins to process currency exchange requests, reward miners and track user balances when the connection is available. By distributing the verification and storage tasks among peers, our system design saves on the overall deployment and operational costs without sacrificing the reliability and trustwor- thiness. Through theoretical and empirical analysis, we provided insights to system design, tested its robustness against network disturbances, and demonstrated the feasibility of implementation on off-the-shelf computers and mobile devices.
CRJul 1, 2017
Measuring, Characterizing, and Detecting Facebook Like FarmsMuhammad Ikram, Lucky Onwuzurike, Shehroze Farooqi et al.
Social networks offer convenient ways to seamlessly reach out to large audiences. In particular, Facebook pages are increasingly used by businesses, brands, and organizations to connect with multitudes of users worldwide. As the number of likes of a page has become a de-facto measure of its popularity and profitability, an underground market of services artificially inflating page likes, aka like farms, has emerged alongside Facebook's official targeted advertising platform. Nonetheless, there is little work that systematically analyzes Facebook pages' promotion methods. Aiming to fill this gap, we present a honeypot-based comparative measurement study of page likes garnered via Facebook advertising and from popular like farms. First, we analyze likes based on demographic, temporal, and social characteristics, and find that some farms seem to be operated by bots and do not really try to hide the nature of their operations, while others follow a stealthier approach, mimicking regular users' behavior. Next, we look at fraud detection algorithms currently deployed by Facebook and show that they do not work well to detect stealthy farms which spread likes over longer timespans and like popular pages to mimic regular users. To overcome their limitations, we investigate the feasibility of timeline-based detection of like farm accounts, focusing on characterizing content generated by Facebook accounts on their timelines as an indicator of genuine versus fake social activity. We analyze a range of features, grouped into two main categories: lexical and non-lexical. We find that like farm accounts tend to re-share content, use fewer words and poorer vocabulary, and more often generate duplicate comments and likes compared to normal users. Using relevant lexical and non-lexical features, we build a classifier to detect like farms accounts that achieves precision higher than 99% and 93% recall.
SIJun 1, 2015
Combating Fraud in Online Social Networks: Detecting Stealthy Facebook Like FarmsMuhammad Ikram, Lucky Onwuzurike, Shehroze Farooqi et al.
As businesses increasingly rely on social networking sites to engage with their customers, it is crucial to understand and counter reputation manipulation activities, including fraudulently boosting the number of Facebook page likes using like farms. To this end, several fraud detection algorithms have been proposed and some deployed by Facebook that use graph co-clustering to distinguish between genuine likes and those generated by farm-controlled profiles. However, as we show in this paper, these tools do not work well with stealthy farms whose users spread likes over longer timespans and like popular pages, aiming to mimic regular users. We present an empirical analysis of the graph-based detection tools used by Facebook and highlight their shortcomings against more sophisticated farms. Next, we focus on characterizing content generated by social networks accounts on their timelines, as an indicator of genuine versus fake social activity. We analyze a wide range of features extracted from timeline posts, which we group into two main classes: lexical and non-lexical. We postulate and verify that like farm accounts tend to often re-share content, use fewer words and poorer vocabulary, and more often generate duplicate comments and likes compared to normal users. We extract relevant lexical and non-lexical features and and use them to build a classifier to detect like farms accounts, achieving significantly higher accuracy, namely, at least 99% precision and 93% recall.
CYMay 7, 2015
Characterizing Key Stakeholders in an Online Black-Hat MarketplaceShehroze Farooqi, Muhammad Ikram, Emiliano De Cristofaro et al.
Over the past few years, many black-hat marketplaces have emerged that facilitate access to reputation manipulation services such as fake Facebook likes, fraudulent search engine optimization (SEO), or bogus Amazon reviews. In order to deploy effective technical and legal countermeasures, it is important to understand how these black-hat marketplaces operate, shedding light on the services they offer, who is selling, who is buying, what are they buying, who is more successful, why are they successful, etc. Toward this goal, in this paper, we present a detailed micro-economic analysis of a popular online black-hat marketplace, namely, SEOClerks.com. As the site provides non-anonymized transaction information, we set to analyze selling and buying behavior of individual users, propose a strategy to identify key users, and study their tactics as compared to other (non-key) users. We find that key users: (1) are mostly located in Asian countries, (2) are focused more on selling black-hat SEO services, (3) tend to list more lower priced services, and (4) sometimes buy services from other sellers and then sell at higher prices. Finally, we discuss the implications of our analysis with respect to devising effective economic and legal intervention strategies against marketplace operators and key users.
SISep 7, 2014
Paying for Likes? Understanding Facebook Like Fraud Using HoneypotsEmiliano De Cristofaro, Arik Friedman, Guillaume Jourjon et al.
Facebook pages offer an easy way to reach out to a very large audience as they can easily be promoted using Facebook's advertising platform. Recently, the number of likes of a Facebook page has become a measure of its popularity and profitability, and an underground market of services boosting page likes, aka like farms, has emerged. Some reports have suggested that like farms use a network of profiles that also like other pages to elude fraud protection algorithms, however, to the best of our knowledge, there has been no systematic analysis of Facebook pages' promotion methods. This paper presents a comparative measurement study of page likes garnered via Facebook ads and by a few like farms. We deploy a set of honeypot pages, promote them using both methods, and analyze garnered likes based on likers' demographic, temporal, and social characteristics. We highlight a few interesting findings, including that some farms seem to be operated by bots and do not really try to hide the nature of their operations, while others follow a stealthier approach, mimicking regular users' behavior.