Arik Friedman

CY
h-index39
7papers
163citations
Novelty39%
AI Score24

7 Papers

SEFeb 15, 2024
Practitioners' Challenges and Perceptions of CI Build Failure Predictions at Atlassian

Yang Hong, Chakkrit Tantithamthavorn, Jirat Pasuksmit et al.

Continuous Integration (CI) build failures could significantly impact the software development process and teams, such as delaying the release of new features and reducing developers' productivity. In this work, we report on an empirical study that investigates CI build failures throughout product development at Atlassian. Our quantitative analysis found that the repository dimension is the key factor influencing CI build failures. In addition, our qualitative survey revealed that Atlassian developers perceive CI build failures as challenging issues in practice. Furthermore, we found that the CI build prediction can not only provide proactive insight into CI build failures but also facilitate the team's decision-making. Our study sheds light on the challenges and expectations involved in integrating CI build prediction tools into the Bitbucket environment, providing valuable insights for enhancing CI processes.

CRJul 1, 2017
Measuring, Characterizing, and Detecting Facebook Like Farms

Muhammad Ikram, Lucky Onwuzurike, Shehroze Farooqi et al.

Social networks offer convenient ways to seamlessly reach out to large audiences. In particular, Facebook pages are increasingly used by businesses, brands, and organizations to connect with multitudes of users worldwide. As the number of likes of a page has become a de-facto measure of its popularity and profitability, an underground market of services artificially inflating page likes, aka like farms, has emerged alongside Facebook's official targeted advertising platform. Nonetheless, there is little work that systematically analyzes Facebook pages' promotion methods. Aiming to fill this gap, we present a honeypot-based comparative measurement study of page likes garnered via Facebook advertising and from popular like farms. First, we analyze likes based on demographic, temporal, and social characteristics, and find that some farms seem to be operated by bots and do not really try to hide the nature of their operations, while others follow a stealthier approach, mimicking regular users' behavior. Next, we look at fraud detection algorithms currently deployed by Facebook and show that they do not work well to detect stealthy farms which spread likes over longer timespans and like popular pages to mimic regular users. To overcome their limitations, we investigate the feasibility of timeline-based detection of like farm accounts, focusing on characterizing content generated by Facebook accounts on their timelines as an indicator of genuine versus fake social activity. We analyze a range of features, grouped into two main categories: lexical and non-lexical. We find that like farm accounts tend to re-share content, use fewer words and poorer vocabulary, and more often generate duplicate comments and likes compared to normal users. Using relevant lexical and non-lexical features, we build a classifier to detect like farms accounts that achieves precision higher than 99% and 93% recall.

SIJun 1, 2015
Combating Fraud in Online Social Networks: Detecting Stealthy Facebook Like Farms

Muhammad Ikram, Lucky Onwuzurike, Shehroze Farooqi et al.

As businesses increasingly rely on social networking sites to engage with their customers, it is crucial to understand and counter reputation manipulation activities, including fraudulently boosting the number of Facebook page likes using like farms. To this end, several fraud detection algorithms have been proposed and some deployed by Facebook that use graph co-clustering to distinguish between genuine likes and those generated by farm-controlled profiles. However, as we show in this paper, these tools do not work well with stealthy farms whose users spread likes over longer timespans and like popular pages, aiming to mimic regular users. We present an empirical analysis of the graph-based detection tools used by Facebook and highlight their shortcomings against more sophisticated farms. Next, we focus on characterizing content generated by social networks accounts on their timelines, as an indicator of genuine versus fake social activity. We analyze a wide range of features extracted from timeline posts, which we group into two main classes: lexical and non-lexical. We postulate and verify that like farm accounts tend to often re-share content, use fewer words and poorer vocabulary, and more often generate duplicate comments and likes compared to normal users. We extract relevant lexical and non-lexical features and and use them to build a classifier to detect like farms accounts, achieving significantly higher accuracy, namely, at least 99% precision and 93% recall.

CYMay 7, 2015
Characterizing Key Stakeholders in an Online Black-Hat Marketplace

Shehroze Farooqi, Muhammad Ikram, Emiliano De Cristofaro et al.

Over the past few years, many black-hat marketplaces have emerged that facilitate access to reputation manipulation services such as fake Facebook likes, fraudulent search engine optimization (SEO), or bogus Amazon reviews. In order to deploy effective technical and legal countermeasures, it is important to understand how these black-hat marketplaces operate, shedding light on the services they offer, who is selling, who is buying, what are they buying, who is more successful, why are they successful, etc. Toward this goal, in this paper, we present a detailed micro-economic analysis of a popular online black-hat marketplace, namely, SEOClerks.com. As the site provides non-anonymized transaction information, we set to analyze selling and buying behavior of individual users, propose a strategy to identify key users, and study their tactics as compared to other (non-key) users. We find that key users: (1) are mostly located in Asian countries, (2) are focused more on selling black-hat SEO services, (3) tend to list more lower priced services, and (4) sometimes buy services from other sellers and then sell at higher prices. Finally, we discuss the implications of our analysis with respect to devising effective economic and legal intervention strategies against marketplace operators and key users.

LGFeb 9, 2015
Rademacher Observations, Private Data, and Boosting

Richard Nock, Giorgio Patrini, Arik Friedman

The minimization of the logistic loss is a popular approach to batch supervised learning. Our paper starts from the surprising observation that, when fitting linear (or kernelized) classifiers, the minimization of the logistic loss is \textit{equivalent} to the minimization of an exponential \textit{rado}-loss computed (i) over transformed data that we call Rademacher observations (rados), and (ii) over the \textit{same} classifier as the one of the logistic loss. Thus, a classifier learnt from rados can be \textit{directly} used to classify \textit{observations}. We provide a learning algorithm over rados with boosting-compliant convergence rates on the \textit{logistic loss} (computed over examples). Experiments on domains with up to millions of examples, backed up by theoretical arguments, display that learning over a small set of random rados can challenge the state of the art that learns over the \textit{complete} set of examples. We show that rados comply with various privacy requirements that make them good candidates for machine learning in a privacy framework. We give several algebraic, geometric and computational hardness results on reconstructing examples from rados. We also show how it is possible to craft, and efficiently learn from, rados in a differential privacy framework. Tests reveal that learning from differentially private rados can compete with learning from random rados, and hence with batch learning from examples, achieving non-trivial privacy vs accuracy tradeoffs.

SISep 7, 2014
Paying for Likes? Understanding Facebook Like Fraud Using Honeypots

Emiliano De Cristofaro, Arik Friedman, Guillaume Jourjon et al.

Facebook pages offer an easy way to reach out to a very large audience as they can easily be promoted using Facebook's advertising platform. Recently, the number of likes of a Facebook page has become a measure of its popularity and profitability, and an underground market of services boosting page likes, aka like farms, has emerged. Some reports have suggested that like farms use a network of profiles that also like other pages to elude fraud protection algorithms, however, to the best of our knowledge, there has been no systematic analysis of Facebook pages' promotion methods. This paper presents a comparative measurement study of page likes garnered via Facebook ads and by a few like farms. We deploy a set of honeypot pages, promote them using both methods, and analyze garnered likes based on likers' demographic, temporal, and social characteristics. We highlight a few interesting findings, including that some farms seem to be operated by bots and do not really try to hide the nature of their operations, while others follow a stealthier approach, mimicking regular users' behavior.

CYFeb 14, 2014
Censorship in the Wild: Analyzing Internet Filtering in Syria

Abdelberi Chaabane, Terence Chen, Mathieu Cunche et al.

Internet censorship is enforced by numerous governments worldwide, however, due to the lack of publicly available information, as well as the inherent risks of performing active measurements, it is often hard for the research community to investigate censorship practices in the wild. Thus, the leak of 600GB worth of logs from 7 Blue Coat SG-9000 proxies, deployed in Syria to filter Internet traffic at a country scale, represents a unique opportunity to provide a detailed snapshot of a real-world censorship ecosystem. This paper presents the methodology and the results of a measurement analysis of the leaked Blue Coat logs, revealing a relatively stealthy, yet quite targeted, censorship. We find that traffic is filtered in several ways: using IP addresses and domain names to block subnets or websites, and keywords or categories to target specific content. We show that keyword-based censorship produces some collateral damage as many requests are blocked even if they do not relate to sensitive content. We also discover that Instant Messaging is heavily censored, while filtering of social media is limited to specific pages. Finally, we show that Syrian users try to evade censorship by using web/socks proxies, Tor, VPNs, and BitTorrent. To the best of our knowledge, our work provides the first analytical look into Internet filtering in Syria.