Michael Segal

CR
9papers
49citations
Novelty50%
AI Score43

9 Papers

LGMay 26, 2022
Opinion Spam Detection: A New Approach Using Machine Learning and Network-Based Algorithms

Kiril Danilchenko, Michael Segal, Dan Vilenchik

E-commerce is the fastest-growing segment of the economy. Online reviews play a crucial role in helping consumers evaluate and compare products and services. As a result, fake reviews (opinion spam) are becoming more prevalent and negatively impacting customers and service providers. There are many reasons why it is hard to identify opinion spammers automatically, including the absence of reliable labeled data. This limitation precludes an off-the-shelf application of a machine learning pipeline. We propose a new method for classifying reviewers as spammers or benign, combining machine learning with a message-passing algorithm that capitalizes on the users' graph structure to compensate for the possible scarcity of labeled data. We devise a new way of sampling the labels for the training step (active learning), replacing the typical uniform sampling. Experiments on three large real-world datasets from Yelp.com show that our method outperforms state-of-the-art active learning approaches and also machine learning methods that use a much larger set of labeled data for training.

LGAug 26, 2024
Provable Imbalanced Point Clustering

David Denisov, Dan Feldman, Shlomi Dolev et al.

We suggest efficient and provable methods to compute an approximation for imbalanced point clustering, that is, fitting $k$-centers to a set of points in $\mathbb{R}^d$, for any $d,k\geq 1$. To this end, we utilize \emph{coresets}, which, in the context of the paper, are essentially weighted sets of points in $\mathbb{R}^d$ that approximate the fitting loss for every model in a given set, up to a multiplicative factor of $1\pm\varepsilon$. We provide [Section 3 and Section E in the appendix] experiments that show the empirical contribution of our suggested methods for real images (novel and reference), synthetic data, and real-world data. We also propose choice clustering, which by combining clustering algorithms yields better performance than each one separately.

NIMay 12
Decentralized Multi-Channel MANET Power Optimization Using Graph Neural Networks

Tomer Alter, Nir Shlezinger, Michael Segal

The increasing demand for mobile ad hoc networks (MANETs) calls for decentralized mechanisms that can allocate transmit power across nodes and channels under stringent resource constraints. Existing optimization-based approaches, however, do not account for expected settings where each link includes multiple channels (e.g., multi-band signaling). Motivated by recent advances in machine learning for distributed optimization, we propose MANET-GNN, a graph neural network (GNN)-based algorithm for decentralized power allocation in multi-channel MANETs. MANET-GNN explicitly exploits the network topology, scales efficiently with the number of nodes and frequency bands, generalizes across topologies and channel conditions, and enables near-instantaneous inference suitable for real-time deployment. Our design builds on a constrained optimization formulation and employs a dedicated GNN architecture inspired by message passing, trained via an unsupervised procedure that is robust to noisy channel state information. Numerical evaluations demonstrate that MANET-GNN achieves high-throughput multi-channel communication across diverse MANET scenarios.

LGNov 16, 2025
Linear time small coresets for k-mean clustering of segments with applications

David Denisov, Shlomi Dolev, Dan Felmdan et al.

We study the $k$-means problem for a set $\mathcal{S} \subseteq \mathbb{R}^d$ of $n$ segments, aiming to find $k$ centers $X \subseteq \mathbb{R}^d$ that minimize $D(\mathcal{S},X) := \sum_{S \in \mathcal{S}} \min_{x \in X} D(S,x)$, where $D(S,x) := \int_{p \in S} |p - x| dp$ measures the total distance from each point along a segment to a center. Variants of this problem include handling outliers, employing alternative distance functions such as M-estimators, weighting distances to achieve balanced clustering, or enforcing unique cluster assignments. For any $\varepsilon > 0$, an $\varepsilon$-coreset is a weighted subset $C \subseteq \mathbb{R}^d$ that approximates $D(\mathcal{S},X)$ within a factor of $1 \pm \varepsilon$ for any set of $k$ centers, enabling efficient streaming, distributed, or parallel computation. We propose the first coreset construction that provably handles arbitrary input segments. For constant $k$ and $\varepsilon$, it produces a coreset of size $O(\log^2 n)$ computable in $O(nd)$ time. Experiments, including a real-time video tracking application, demonstrate substantial speedups with minimal loss in clustering accuracy, confirming both the practical efficiency and theoretical guarantees of our method.

CRMay 28, 2019
Privacy Vulnerabilities of Dataset Anonymization Techniques

Eyal Nussbaum, Michael Segal

Vast amounts of information of all types are collected daily about people by governments, corporations and individuals. The information is collected when users register to or use on-line applications, receive health related services, use their mobile phones, utilize search engines, or perform common daily activities. As a result, there is an enormous quantity of privately-owned records that describe individuals' finances, interests, activities, and demographics. These records often include sensitive data and may violate the privacy of the users if published. The common approach to safeguarding user information, or data in general, is to limit access to the storage (usually a database) by using and authentication and authorization protocol. This way, only users with legitimate permissions can access the user data. In many cases though, the publication of user data for statistical analysis and research can be extremely beneficial for both academic and commercial uses, such as statistical research and recommendation systems. To maintain user privacy when such a publication occurs many databases employ anonymization techniques, either on the query results or the data itself. In this paper we examine variants of 2 such techniques, "data perturbation" and "query-set-size control" and discuss their vulnerabilities. Data perturbation deals with changing the values of records in the dataset while maintaining a level of accuracy over the resulting queries. We focus on a relatively new data perturbation method called NeNDS to show a possible partial knowledge attack on its privacy. The query-set-size control allows publication of a query result dependent on having a minimum set size, k, of records satisfying the query parameters. We show some query types relying on this method may still be used to extract hidden information, and prove others maintain privacy even when using multiple queries.

CRMay 23, 2019
Approximate String Matching for DNS Anomaly Detection

Roni Mateless, Michael Segal

In this paper we propose a novel approach to identify anomalies in DNS traffic. The traffic time-points data is transformed to a string, which is used by new fast appproximate string matching algorithm to detect anomalies. Our approach is generic in its nature and allows fast adaptation to different types of traffic. We evaluate the approach on a large public dataset of DNS traffic based on 10 days, discovering more than order of magnitude DNS attacks in comparison to auto-regression as a baseline. Moreover, the additional comparison has been made including other common regressors such as Linear Regression, Lasso, Random Forest and KNN, all of them showing the superiority of our approach.

CRJan 4, 2019
Breaching the privacy of connected vehicles network

Vladimir Kaplun, Michael Segal

Connected Vehicles network is designed to provide a secure and private method for drivers to use the most efficiently the roads in certain area. When dealing with the scenario of car to access points connectivity (Wi-Fi, 3G, LTE), the vehicles are connected by central authority like cloud. Thus, they can be monitored and analyzed by the cloud which can provide certain services to the driver, i.e. usage based insurance (UBI), entertainment services, navigation etc. The main objective of this work is to show that by analyzing the information about a driver which is provided to the usage based insurance companies, it is possible to get additional private data, even if the basic data in first look, seems not so harmful. In this work, we present an analysis of a novel approach for reconstructing the path of driver from other driving attributes, such as cornering events, average speed and total driving time. We show that, in some cases, it is possible to reconstruct the path of driver, while not knowing the target point of the trip.

CRAug 6, 2015
Vehicle to Vehicle Authentication

Shlomi Dolev, Lukasz Krzywiecki, Nisha Panwar et al.

In recent future, vehicles will establish a spontaneous connection over a wireless radio channel, coordinating actions and information. Vehicles will exchange warning messages over the wireless radio channel through Dedicated Short Range Communication (IEEE 1609) over the Wireless Access in Vehicular Environment (802.11p). Unfortunately, the wireless communication among vehicles is vulnerable to security threats that may lead to very serious safety hazards. Therefore, the warning messages being exchanged must incorporate an authentic factor such that recipient is willing to verify and accept the message in a timely manner

CRJul 16, 2015
Vehicle Authentication via Monolithically Certified Public Key and Attributes

Shlomi Dolev, Łukasz Krzywiecki, Nisha Panwar et al.

Vehicular networks are used to coordinate actions among vehicles in traffic by the use of wireless transceivers (pairs of transmitters and receivers). Unfortunately, the wireless communication among vehicles is vulnerable to security threats that may lead to very serious safety hazards. In this work, we propose a viable solution for coping with Man-in-the-Middle attacks. Conventionally, Public Key Infrastructure (PKI) is utilized for a secure communication with the pre-certified public key. However, a secure vehicle-to-vehicle communication requires additional means of verification in order to avoid impersonation attacks. To the best of our knowledge, this is the first work that proposes to certify both the public key and out-of-band sense-able static attributes to enable mutual authentication of the communicating vehicles. Vehicle owners are bound to preprocess (periodically) a certificate for both a public key and a list of fixed unchangeable attributes of the vehicle. Furthermore, the proposed approach is shown to be adaptable with regards to the existing authentication protocols. We illustrate the security verification of the proposed protocol using a detailed proof in Spi calculus.