Rami Puzis

h-index28

26papers

146citations

Novelty43%

AI Score45

Ranked #42,237 of 194,257 authors (top 22%)#901 in CR (top 13%)

26 Papers

2.3SEDec 29, 2022

Cross Version Defect Prediction with Class Dependency Embeddings

Moti Cohen, Lior Rokach, Rami Puzis

Software Defect Prediction aims at predicting which software modules are the most probable to contain defects. The idea behind this approach is to save time during the development process by helping find bugs early. Defect Prediction models are based on historical data. Specifically, one can use data collected from past software distributions, or Versions, of the same target application under analysis. Defect Prediction based on past versions is called Cross Version Defect Prediction (CVDP). Traditionally, Static Code Metrics are used to predict defects. In this work, we use the Class Dependency Network (CDN) as another predictor for defects, combined with static code metrics. CDN data contains structural information about the target application being analyzed. Usually, CDN data is analyzed using different handcrafted network measures, like Social Network metrics. Our approach uses network embedding techniques to leverage CDN information without having to build the metrics manually. In order to use the embeddings between versions, we incorporate different embedding alignment techniques. To evaluate our approach, we performed experiments on 24 software release pairs and compared it against several benchmark methods. In these experiments, we analyzed the performance of two different graph embedding techniques, three anchor selection approaches, and two alignment techniques. We also built a meta-model based on two different embeddings and achieved a statistically significant improvement in AUC of 4.7% (p < 0.002) over the baseline method.

6.1CLSep 29, 2024Code

Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales

Maor Reuben, Ortal Slobodin, Aviad Elyshar et al.

Human-like personality traits have recently been discovered in large language models, raising the hypothesis that their (known and as yet undiscovered) biases conform with human latent psychological constructs. While large conversational models may be tricked into answering psychometric questionnaires, the latent psychological constructs of thousands of simpler transformers, trained for other tasks, cannot be assessed because appropriate psychometric methods are currently lacking. Here, we show how standard psychological questionnaires can be reformulated into natural language inference prompts, and we provide a code library to support the psychometric assessment of arbitrary models. We demonstrate, using a sample of 88 publicly available models, the existence of human-like mental health-related constructs (including anxiety, depression, and Sense of Coherence) which conform with standard theories in human psychology and show similar correlations and mitigation strategies. The ability to interpret and rectify the performance of language models by using psychological tools can boost the development of more explainable, controllable, and trustworthy models.

1.2SINov 10, 2022Code

Can one hear the position of nodes?

Rami Puzis

Wave propagation through nodes and links of a network forms the basis of spectral graph theory. Nevertheless, the sound emitted by nodes within the resonating chamber formed by a network are not well studied. The sound emitted by vibrations of individual nodes reflects the structure of the overall network topology but also the location of the node within the network. In this article, a sound recognition neural network is trained to infer centrality measures from the nodes' wave-forms. In addition to advancing network representation learning, sounds emitted by nodes are plausible in most cases. Auralization of the network topology may open new directions in arts, competing with network visualization.

0.6CLFeb 19

Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests

Anton Dzega, Aviad Elyashar, Ortal Slobodin et al.

Thematic Apperception Test (TAT) is a psychometrically grounded, multidimensional assessment framework that systematically differentiates between cognitive-representational and affective-relational components of personality-like functioning. This test is a projective psychological framework designed to uncover unconscious aspects of personality. This study examines whether the personality traits of Large Multimodal Models (LMMs) can be assessed through non-language-based modalities, using the Social Cognition and Object Relations Scale - Global (SCORS-G). LMMs are employed in two distinct roles: as subject models (SMs), which generate stories in response to TAT images, and as evaluator models (EMs), who assess these narratives using the SCORS-G framework. Evaluators demonstrated an excellent ability to understand and analyze TAT responses. Their interpretations are highly consistent with those of human experts. Assessment results highlight that all models understand interpersonal dynamics very well and have a good grasp of the concept of self. However, they consistently fail to perceive and regulate aggression. Performance varied systematically across model families, with larger and more recent models consistently outperforming smaller and earlier ones across SCORS-G dimensions.

1.4LGMar 2

Learning graph topology from metapopulation epidemic encoder-decoder

Xin Li, Jonathan Cohen, Shai Pilosof et al.

Metapopulation epidemic models are a valuable tool for studying large-scale outbreaks. With the limited availability of epidemic tracing data, it is challenging to infer the essential constituents of these models, namely, the epidemic parameters and the relevant mobility network between subpopulations. Either one of these constituents can be estimated while assuming the other; however, the problem of their joint inference has not yet been solved. Here, we propose two encoder-decoder deep learning architectures that infer metapopulation mobility graphs from time-series data, with and without the assumption of epidemic model parameters. Evaluation across diverse random and empirical mobility networks shows that the proposed approach outperforms the state-of-the-art topology inference. Further, we show that topology inference improves dramatically with data on additional pathogens. Our study establishes a robust framework for simultaneously inferring epidemic parameters and topology, addressing a persistent gap in modeling disease propagation.

10.2CRMar 3, 2020

SoK: A Survey of Open-Source Threat Emulators

Polina Zilberman, Rami Puzis, Sunders Bruskin et al.

Threat emulators are tools or sets of scripts that emulate cyber attacks or malicious behavior. They can be used to create and launch single procedure attacks and multi-step attacks; the resulting attacks may be known or unknown cyber attacks. The motivation for using threat emulators varies and includes the need to perform automated security audits in organizations or reduce the size of red teams in order to lower pen testing costs; or the desire to create baseline tests for security tools under development or supply pen testers with another tool in their arsenal. In this paper, we review and compare various open-source threat emulators. We focus on tactics and techniques from the MITRE ATT&CK Enterprise matrix and determine whether they can be performed and tested with the emulators. We develop a comprehensive methodology for our qualitative and quantitative comparison of threat emulators with respect to general features, such as prerequisites, attack definition, cleanup, and more. Finally, we discuss the circumstances in which one threat emulator is preferred over another. This survey can help security teams, security developers, and product deployment teams examine their network environment or products with the most suitable threat emulator. Using the guidelines provided, a team can select the threat emulator that best meets their needs without evaluating all of them.

3.6CRJun 3, 2025

ATAG: AI-Agent Application Threat Assessment with Attack Graphs

Parth Atulbhai Gandhi, Akansha Shukla, David Tayouri et al.

Evaluating the security of multi-agent systems (MASs) powered by large language models (LLMs) is challenging, primarily because of the systems' complex internal dynamics and the evolving nature of LLM vulnerabilities. Traditional attack graph (AG) methods often lack the specific capabilities to model attacks on LLMs. This paper introduces AI-agent application Threat assessment with Attack Graphs (ATAG), a novel framework designed to systematically analyze the security risks associated with AI-agent applications. ATAG extends the MulVAL logic-based AG generation tool with custom facts and interaction rules to accurately represent AI-agent topologies, vulnerabilities, and attack scenarios. As part of this research, we also created the LLM vulnerability database (LVD) to initiate the process of standardizing LLM vulnerabilities documentation. To demonstrate ATAG's efficacy, we applied it to two multi-agent applications. Our case studies demonstrated the framework's ability to model and generate AGs for sophisticated, multi-step attack scenarios exploiting vulnerabilities such as prompt injection, excessive agency, sensitive information disclosure, and insecure output handling across interconnected agents. ATAG is an important step toward a robust methodology and toolset to help understand, visualize, and prioritize complex attack paths in multi-agent AI systems (MAASs). It facilitates proactive identification and mitigation of AI-agent threats in multi-agent applications.

3.6CRJun 13, 2025

FAA Framework: A Large Language Model-Based Approach for Credit Card Fraud Investigations

Shaun Shuster, Eyal Zaloof, Asaf Shabtai et al.

The continuous growth of the e-commerce industry attracts fraudsters who exploit stolen credit card details. Companies often investigate suspicious transactions in order to retain customer trust and address gaps in their fraud detection systems. However, analysts are overwhelmed with an enormous number of alerts from credit card transaction monitoring systems. Each alert investigation requires from the fraud analysts careful attention, specialized knowledge, and precise documentation of the outcomes, leading to alert fatigue. To address this, we propose a fraud analyst assistant (FAA) framework, which employs multi-modal large language models (LLMs) to automate credit card fraud investigations and generate explanatory reports. The FAA framework leverages the reasoning, code execution, and vision capabilities of LLMs to conduct planning, evidence collection, and analysis in each investigation step. A comprehensive empirical evaluation of 500 credit card fraud investigations demonstrates that the FAA framework produces reliable and efficient investigations comprising seven steps on average. Thus we found that the FAA framework can automate large parts of the workload and help reduce the challenges faced by fraud analysts.

2.3CRNov 20, 2024

The Information Security Awareness of Large Language Models

Ofir Cohen, Gil Ari Agmon, Asaf Shabtai et al.

The popularity of large language models (LLMs) continues to grow, and LLM-based assistants have become ubiquitous. Information security awareness (ISA) is an important yet underexplored safety aspect of LLMs. ISA encompasses LLMs' security knowledge, which has been explored in the past, as well as attitudes and behaviors, which are crucial to LLMs' ability to understand implicit security context and reject unsafe requests that may cause the LLM to fail the user. We present an automated method for measuring the ISA of LLMs, which covers all 30 security topics in a mobile ISA taxonomy, using realistic scenarios that create tension between implicit security implications and user satisfaction. Applying this method to leading LLMs, we find that most of the popular models exhibit only medium to low levels of ISA, exposing their users to cybersecurity threats. Smaller variants of the same model family are significantly riskier, while newer versions show no consistent ISA improvement, suggesting that providers are not actively working toward mitigating this issue. These results reveal a widespread vulnerability affecting current LLM deployments: the majority of popular models, and particularly their smaller variants, may systematically endanger users. We propose a practical mitigation: incorporating our security awareness instruction into model system prompts to help LLMs better detect and reject unsafe requests.

2.8CVMay 11, 2023

ReMark: Receptive Field based Spatial WaterMark Embedding Optimization using Deep Network

Natan Semyonov, Rami Puzis, Asaf Shabtai et al.

Watermarking is one of the most important copyright protection tools for digital media. The most challenging type of watermarking is the imperceptible one, which embeds identifying information in the data while retaining the latter's original quality. To fulfill its purpose, watermarks need to withstand various distortions whose goal is to damage their integrity. In this study, we investigate a novel deep learning-based architecture for embedding imperceptible watermarks. The key insight guiding our architecture design is the need to correlate the dimensions of our watermarks with the sizes of receptive fields (RF) of modules of our architecture. This adaptation makes our watermarks more robust, while also enabling us to generate them in a way that better maintains image quality. Extensive evaluations on a wide variety of distortions show that the proposed method is robust against most common distortions on watermarks including collusive distortion.

3.0IRDec 23, 2020

Fake News Data Collection and Classification: Iterative Query Selection for Opaque Search Engines with Pseudo Relevance Feedback

Aviad Elyashar, Maor Reuben, Rami Puzis

Retrieving information from an online search engine, is the first and most important step in many data mining tasks. Most of the search engines currently available on the web, including all social media platforms, are black-boxes (a.k.a opaque) supporting short keyword queries. In these settings, retrieving all posts and comments discussing a particular news item automatically and at large scales is a challenging task. In this paper, we propose a method for generating short keyword queries given a prototype document. The proposed iterative query selection algorithm (IQS) interacts with the opaque search engine to iteratively improve the query. It is evaluated on the Twitter TREC Microblog 2012 and TREC-COVID 2019 datasets showing superior performance compared to state-of-the-art. IQS is applied to automatically collect a large-scale fake news dataset of about 70K true and fake news items. The dataset, publicly available for research, includes more than 22M accounts and 61M tweets in Twitter approved format. We demonstrate the usefulness of the dataset for fake news detection task achieving state-of-the-art performance.

8.8CRNov 28, 2020

Cyberbiosecurity: DNA Injection Attack in Synthetic Biology

Dor Farbiash, Rami Puzis

Today arbitrary synthetic DNA can be ordered online and delivered within several days. In order to regulate both intentional and unintentional generation of dangerous substances, most synthetic gene providers screen DNA orders. A weakness in the Screening Framework Guidance for Providers of Synthetic Double-Stranded DNA allows screening protocols based on this guidance to be circumvented using a generic obfuscation procedure inspired by early malware obfuscation techniques. Furthermore, accessibility and automation of the synthetic gene engineering workflow, combined with insufficient cybersecurity controls, allow malware to interfere with biological processes within the victim's lab, closing the loop with the possibility of an exploit written into a DNA molecule presented by Ney et al. in USENIX Security'17. Here we present an end-to-end cyberbiological attack, in which unwitting biologists may be tricked into generating dangerous substances within their labs. Consequently, despite common biosecurity assumptions, the attacker does not need to have physical contact with the generated substance. The most challenging part of the attack, decoding of the obfuscated DNA, is executed within living cells while using primitive biological operations commonly employed by biologists during in-vivo gene editing. This attack scenario underlines the need to harden the synthetic DNA supply chain with protections against cyberbiological threats. To address these threats we propose an improved screening protocol that takes into account in-vivo gene editing.

1.2NIOct 3, 2020

Predicting traffic overflows on private peering

Elad Rapaport, Ingmar Poese, Polina Zilberman et al.

Large content providers and content distribution network operators usually connect with large Internet service providers (eyeball networks) through dedicated private peering. The capacity of these private network interconnects is provisioned to match the volume of the real content demand by the users. Unfortunately, in case of a surge in traffic demand, for example due to a content trending in a certain country, the capacity of the private interconnect may deplete and the content provider/distributor would have to reroute the excess traffic through transit providers. Although, such overflow events are rare, they have significant negative impacts on content providers, Internet service providers, and end-users. These include unexpected delays and disruptions reducing the user experience quality, as well as direct costs paid by the Internet service provider to the transit providers. If the traffic overflow events could be predicted, the Internet service providers would be able to influence the routes chosen for the excess traffic to reduce the costs and increase user experience quality. In this article we propose a method based on an ensemble of deep learning models to predict overflow events over a short term horizon of 2-6 hours and predict the specific interconnections that will ingress the overflow traffic. The method was evaluated with 2.5 years' traffic measurement data from a large European Internet service provider resulting in a true-positive rate of 0.8 while maintaining a 0.05 false-positive rate. The lockdown imposed by the COVID-19 pandemic reduced the overflow prediction accuracy. Nevertheless, starting from the end of April 2020 with the gradual lockdown release, the old models trained before the pandemic perform equally well.

0.3CLMay 24, 2020Code

How Does That Sound? Multi-Language SpokenName2Vec Algorithm Using Speech Generation and Deep Learning

Aviad Elyashar, Rami Puzis, Michael Fire

Searching for information about a specific person is an online activity frequently performed by many users. In most cases, users are aided by queries containing a name and sending back to the web search engines for finding their will. Typically, Web search engines provide just a few accurate results associated with a name-containing query. Currently, most solutions for suggesting synonyms in online search are based on pattern matching and phonetic encoding, however very often, the performance of such solutions is less than optimal. In this paper, we propose SpokenName2Vec, a novel and generic approach which addresses the similar name suggestion problem by utilizing automated speech generation, and deep learning to produce spoken name embeddings. This sophisticated and innovative embeddings captures the way people pronounce names in any language and accent. Utilizing the name pronunciation can be helpful for both differentiating and detecting names that sound alike, but are written differently. The proposed approach was demonstrated on a large-scale dataset consisting of 250,000 forenames and evaluated using a machine learning classifier and 7,399 names with their verified synonyms. The performance of the proposed approach was found to be superior to 10 other algorithms evaluated in this study, including well used phonetic and string similarity algorithms, and two recently proposed algorithms. The results obtained suggest that the proposed approach could serve as a useful and valuable tool for solving the similar name suggestion problem.

5.2CRMar 7, 2020

ATHAFI: Agile Threat Hunting And Forensic Investigation

Rami Puzis, Polina Zilberman, Yuval Elovici

Attackers rapidly change their attacks to evade detection. Even the most sophisticated Intrusion Detection Systems that are based on artificial intelligence and advanced data analytic cannot keep pace with the rapid development of new attacks. When standard detection mechanisms fail or do not provide sufficient forensic information to investigate and mitigate attacks, targeted threat hunting performed by competent personnel is used. Unfortunately, many organization do not have enough security analysts to perform threat hunting tasks and today the level of automation of threat hunting is low. In this paper we describe a framework for agile threat hunting and forensic investigation (ATHAFI), which automates the threat hunting process at multiple levels. Adaptive targeted data collection, attack hypotheses generation, hypotheses testing, and continuous threat intelligence feeds allow to perform simple investigations in a fully automated manner. The increased level of automation will significantly boost the analyst's productivity during investigation of the harshest cases. Special Workflow Generation module adapts the threat hunting procedures either to the latest Threat Intelligence obtained from external sources (e.g. National CERT) or to the likeliest attack hypotheses generated by the Attack Hypotheses Generation module. The combination of Attack Hypotheses Generation and Workflows Generation enables intelligent adjustment of workflows, which react to emerging threats effectively.

11.5CRMar 5, 2020

DANTE: A framework for mining and monitoring darknet traffic

Dvir Cohen, Yisroel Mirsky, Yuval Elovici et al.

Trillions of network packets are sent over the Internet to destinations which do not exist. This 'darknet' traffic captures the activity of botnets and other malicious campaigns aiming to discover and compromise devices around the world. In order to mine threat intelligence from this data, one must be able to handle large streams of logs and represent the traffic patterns in a meaningful way. However, by observing how network ports (services) are used, it is possible to capture the intent of each transmission. In this paper, we present DANTE: a framework and algorithm for mining darknet traffic. DANTE learns the meaning of targeted network ports by applying Word2Vec to observed port sequences. Then, when a host sends a new sequence, DANTE represents the transmission as the average embedding of the ports found that sequence. Finally, DANTE uses a novel and incremental time-series cluster tracking algorithm on observed sequences to detect recurring behaviors and new emerging threats. To evaluate the system, we ran DANTE on a full year of darknet traffic (over three Tera-Bytes) collected by the largest telecommunications provider in Europe, Deutsche Telekom and analyzed the results. DANTE discovered 1,177 new emerging threats and was able to track malicious campaigns over time. We also compared DANTE to the current best approach and found DANTE to be more practical and effective at detecting darknet traffic patterns.

1.2NIFeb 23, 2020

Sequence Preserving Network Traffic Generation

Sigal Shaked, Amos Zamir, Roman Vainshtein et al.

We present the Network Traffic Generator (NTG), a framework for perturbing recorded network traffic with the purpose of generating diverse but realistic background traffic for network simulation and what-if analysis in enterprise environments. The framework preserves many characteristics of the original traffic recorded in an enterprise, as well as sequences of network activities. Using the proposed framework, the original traffic flows are profiled using 200 cross-protocol features. The traffic is aggregated into flows of packets between IP pairs and clustered into groups of similar network activities. Sequences of network activities are then extracted. We examined two methods for extracting sequences of activities: a Markov model and a neural language model. Finally, new traffic is generated using the extracted model. We developed a prototype of the framework and conducted extensive experiments based on two real network traffic collections. Hypothesis testing was used to examine the difference between the distribution of original and generated features, showing that 30-100\% of the extracted features were preserved. Small differences between n-gram perplexities in sequences of network activities in the original and generated traffic, indicate that sequences of network activities were well preserved.

3.1IRDec 9, 2019

It Runs in the Family: Searching for Synonyms Using Digitized Family Trees

Aviad Elyashar, Rami Puzis, Michael Fire

Searching for a person's name is a common online activity. However, Web search engines provide few accurate results to queries containing names. In contrast to a general word which has only one correct spelling, there are several legitimate spellings of a given name. Today, most techniques used to suggest synonyms in online search are based on pattern matching and phonetic encoding, however they often perform poorly. As a result, there is a need for an effective tool for improved synonym suggestion. In this paper, we propose a revolutionary approach for tackling the problem of synonym suggestion. Our novel algorithm, GRAFT, utilizes historical data collected from genealogy websites, along with network algorithms. GRAFT is a general algorithm that suggests synonyms using a graph based on names derived from digitized ancestral family trees. Synonyms are extracted from this graph, which is constructed using generic ordering functions that outperform other algorithms that suggest synonyms based on a single dimension, a factor that limits their performance. We evaluated GRAFT's performance on three ground truth datasets of forenames and surnames, including a large-scale online genealogy dataset with over 16 million profiles and more than 700,000 unique forenames and 500,000 surnames. We compared GRAFT's performance at suggesting synonyms to 10 other algorithms, including phonetic encoding, string similarity algorithms, and machine and deep learning algorithms. The results show GRAFT's superiority with respect to both forenames and surnames and demonstrate its use as a tool to improve synonym suggestion.

2.7CRJun 26, 2019

Challenges for Security Assessment of Enterprises in the IoT Era

Yael Mathov, Noga Agmon, Asaf Shabtai et al.

For years, attack graphs have been an important tool for security assessment of enterprise networks, but IoT devices, a new player in the IT world, might threat the reliability of this tool. In this paper, we review the challenges that must be addressed when using attack graphs to model and analyze enterprise networks that include IoT devices. In addition, we propose novel ideas and countermeasures aimed at addressing these challenges.

9.7CRJun 24, 2019

Evaluating the Information Security Awareness of Smartphone Users

Ron Bitton, Kobi Boymgold, Rami Puzis et al.

Information security awareness (ISA) is a practice focused on the set of skills, which help a user successfully mitigate a social engineering attack. Previous studies have presented various methods for evaluating the ISA of both PC and mobile users. These methods rely primarily on subjective data sources such as interviews, surveys, and questionnaires that are influenced by human interpretation and sincerity. Furthermore, previous methods for evaluating ISA did not address the differences between classes of social engineering attacks. In this paper, we present a novel framework designed for evaluating the ISA of smartphone users to specific social engineering attack classes. In addition to questionnaires, the proposed framework utilizes objective data sources: a mobile agent and a network traffic monitor; both of which are used to analyze the actual behavior of users. We empirically evaluated the ISA scores assessed from the three data sources (namely, the questionnaires, mobile agent, and network traffic monitor) by conducting a long-term user study involving 162 smartphone users. All participants were exposed to four different security challenges that resemble real-life social engineering attacks. These challenges were used to assess the ability of the proposed framework to derive a relevant ISA score. The results of our experiment show that: (1) the self-reported behavior of the users differs significantly from their actual behavior; and (2) ISA scores derived from data collected by the mobile agent or the network traffic monitor are highly correlated with the users' success in mitigating social engineering attacks.

8.3CRApr 11, 2019

Deployment Optimization of IoT Devices through Attack Graph Analysis

Noga Agmon, Asaf Shabtai, Rami Puzis

The Internet of things (IoT) has become an integral part of our life at both work and home. However, these IoT devices are prone to vulnerability exploits due to their low cost, low resources, the diversity of vendors, and proprietary firmware. Moreover, short range communication protocols (e.g., Bluetooth or ZigBee) open additional opportunities for the lateral movement of an attacker within an organization. Thus, the type and location of IoT devices may significantly change the level of network security of the organizational network. In this paper, we quantify the level of network security based on an augmented attack graph analysis that accounts for the physical location of IoT devices and their communication capabilities. We use the depth-first branch and bound (DFBnB) heuristic search algorithm to solve two optimization problems: Full Deployment with Minimal Risk (FDMR) and Maximal Utility without Risk Deterioration (MURD). An admissible heuristic is proposed to accelerate the search. The proposed method is evaluated using a real network with simulated deployment of IoT devices. The results demonstrate (1) the contribution of the augmented attack graphs to quantifying the impact of IoT devices deployed within the organization on security, and (2) the effectiveness of the optimized IoT deployment.

14.8CRMar 6, 2019

Attack Graph Obfuscation

Rami Puzis, Hadar Polad, Bracha Shapira

Before executing an attack, adversaries usually explore the victim's network in an attempt to infer the network topology and identify vulnerabilities in the victim's servers and personal computers. Falsifying the information collected by the adversary post penetration may significantly slower lateral movement and increase the amount of noise generated within the victim's network. We investigate the effect of fake vulnerabilities within a real enterprise network on the attacker performance. We use the attack graphs to model the path of an attacker making its way towards a target in a given network. We use combinatorial optimization in order to find the optimal assignments of fake vulnerabilities. We demonstrate the feasibility of our deception-based defense by presenting results of experiments with a large scale real network. We show that adding fake vulnerabilities forces the adversary to invest a significant amount of effort, in terms of time and exploitability cost.

3.2HCMay 21, 2017

MindDesktop: a general purpose brain computer interface

Ori Ossmy, Ofir Tam, Rami Puzis et al.

Recent advances in electroencephalography (EEG) and electromyography (EMG) enable communication for people with severe disabilities. In this paper we present a system that enables the use of regular computers using an off-the-shelf EEG/EMG headset, providing a pointing device and virtual keyboard that can be used to operate any Windows based system, minimizing the user effort required for interacting with a personal computer. Effectiveness of the proposed system is evaluated by a usability study, indicating decreasing learning curve for completing various tasks. The proposed system is available in the link provided.

0.7LGJan 1, 2017

Classification of Smartphone Users Using Internet Traffic

Andrey Finkelstein, Ron Biton, Rami Puzis et al.

Today, smartphone devices are owned by a large portion of the population and have become a very popular platform for accessing the Internet. Smartphones provide the user with immediate access to information and services. However, they can easily expose the user to many privacy risks. Applications that are installed on the device and entities with access to the device's Internet traffic can reveal private information about the smartphone user and steal sensitive content stored on the device or transmitted by the device over the Internet. In this paper, we present a method to reveal various demographics and technical computer skills of smartphone users by their Internet traffic records, using machine learning classification models. We implement and evaluate the method on real life data of smartphone users and show that smartphone users can be classified by their gender, smoking habits, software programming experience, and other characteristics.

3.1CRJan 2, 2016

The Security of WebRTC

Ben Feher, Lior Sidi, Asaf Shabtai et al.

WebRTC is an API that allows users to share streaming information, whether it is text, sound, video or files. It is supported by all major browsers and has a flexible underlying infrastructure. In this study we review current WebRTC structure and security in the contexts of communication disruption, modification and eavesdropping. In addition, we examine WebRTC security in a few representative scenarios, setting up and simulating real WebRTC environments and attacks.

3.0CRMay 7, 2012

Detecting Spammers via Aggregated Historical Data Set

Eitan Menahem, Rami Puzis

The battle between email service providers and senders of mass unsolicited emails (Spam) continues to gain traction. Vast numbers of Spam emails are sent mainly from automatic botnets distributed over the world. One method for mitigating Spam in a computationally efficient manner is fast and accurate blacklisting of the senders. In this work we propose a new sender reputation mechanism that is based on an aggregated historical data-set which encodes the behavior of mail transfer agents over time. A historical data-set is created from labeled logs of received emails. We use machine learning algorithms to build a model that predicts the \emph{spammingness} of mail transfer agents in the near future. The proposed mechanism is targeted mainly at large enterprises and email service providers and can be used for updating both the black and the white lists. We evaluate the proposed mechanism using 9.5M anonymized log entries obtained from the biggest Internet service provider in Europe. Experiments show that proposed method detects more than 94% of the Spam emails that escaped the blacklist (i.e., TPR), while having less than 0.5% false-alarms. Therefore, the effectiveness of the proposed method is much higher than of previously reported reputation mechanisms, which rely on emails logs. In addition, the proposed method, when used for updating both the black and white lists, eliminated the need in automatic content inspection of 4 out of 5 incoming emails, which resulted in dramatic reduction in the filtering computational load.