OCAug 27, 2018
A numerical scheme for a mean field game in some queueing systems based on Markov chain approximation methodErhan Bayraktar, Amarjit Budhiraja, Asaf Cohen
We use the Markov chain approximation method to construct approximations for the solution of the mean field game (MFG) with reflecting barriers studied in Bayraktar, Budhiraja, and Cohen (2017). The MFG is formulated in terms of a controlled reflected diffusion with a cost function that depends on the reflection terms in addition to the standard variables: state, control, and the mean field term. This MFG arises from the asymptotic analysis of an $N$-player game for single server queues with strategic servers. By showing that our scheme is an almost contraction, we establish the convergence of this numerical scheme over a small time interval.
50.7LGMay 6
Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic GuaranteesAhmad Aghapour, Erhan Bayraktar, Asaf Cohen
We study zero-shot conditional sampling with pretrained diffusion models for linear inverse problems, including inpainting and super-resolution. In these problems, the observation determines only part of the unknown signal. The remaining degrees of freedom must be sampled according to the correct conditional data distribution. Existing projection-based samplers enforce measurement consistency by correcting the observed component during reverse diffusion. However, measurement consistency alone does not determine how probability mass should be distributed along the feasible set, and this can lead to biased conditional samples. We analyze this issue through a normal--tangent decomposition of the score function. For Gaussian noising, the observed-direction score is exactly determined by the measurement; only the tangent conditional score is unknown. We prove that the error from replacing this score by the unconditional tangent score is upper bounded by a dimension-free conditional mutual information between observed and unobserved components. This gives an information-theoretic decomposition into initialization and pathwise score-mismatch errors. Motivated by the theory, we propose a projected-Langevin initialization followed by guided reverse denoising, which outperforms a strong projection-based baseline in inpainting and super-resolution experiments.
ITFeb 8, 2017
Optimal Dynamic Routing for the Wireless Relay ChannelAsaf Cohen, Dennis Goeckel, Omer Gurewitz et al.
Consider a communication network with a source, a relay and a destination. Each time interval, the source may dynamically choose between a few possible coding schemes, based on the channel state, traffic pattern and its own queue status. For example, the source may choose between a direct route to the destination and a relay-assisted scheme. Clearly, due to the difference in the performance achieved, as well as the resources each scheme uses, a sender might wish to choose the most appropriate one based on its status. In this work, we formulate the problem as a Semi-Markov Decision Process. This formulation allows us to find an optimal policy, expressed as a function of the number of packets in the source queue and other parameters. In particular, we show a general solution which covers various configurations, including different packet size distributions and varying channels. Furthermore, for the case of exponential transmission times, we analytically prove the optimal policy has a threshold structure, that is, there is a unique value of a single parameter which determines which scheme (or route) is optimal. Results are also validated with simulations for several interesting models.
ITDec 22, 2025
On Cost-Aware Sequential Hypothesis Testing with Random Costs and Action CancellationGeorge Vershinin, Asaf Cohen, Omer Gurewitz
We study a variant of cost-aware sequential hypothesis testing in which a single active Decision Maker (DM) selects actions with positive, random costs to identify the true hypothesis under an average error constraint, while minimizing the expected total cost. The DM may abort an in-progress action, yielding no sample, by truncating its realized cost at a smaller, tunable deterministic limit, which we term a per-action deadline. We analyze how this cancellation option can be exploited under two cost-revelation models: ex-post, where the cost is revealed only after the sample is obtained, and ex-ante, where the cost accrues before sample acquisition. In the ex-post model, per-action deadlines do not affect the expected total cost, and the cost-error tradeoffs coincide with the baseline obtained by replacing deterministic costs with cost means. In the ex-ante model, we show how per-action deadlines inflate the expected number of times actions are applied, and that the resulting expected total cost can be reduced to the constant-cost setting by introducing an effective per-action cost. We characterize when deadlines are beneficial and study several families in detail.
LGJul 25, 2025
Secure Best Arm Identification in the Presence of a CopycatAsaf Cohen, Onur Günlü
Consider the problem of best arm identification with a security constraint. Specifically, assume a setup of stochastic linear bandits with $K$ arms of dimension $d$. In each arm pull, the player receives a reward that is the sum of the dot product of the arm with an unknown parameter vector and independent noise. The player's goal is to identify the best arm after $T$ arm pulls. Moreover, assume a copycat Chloe is observing the arm pulls. The player wishes to keep Chloe ignorant of the best arm. While a minimax--optimal algorithm identifies the best arm with an $Ω\left(\frac{T}{\log(d)}\right)$ error exponent, it easily reveals its best-arm estimate to an outside observer, as the best arms are played more frequently. A naive secure algorithm that plays all arms equally results in an $Ω\left(\frac{T}{d}\right)$ exponent. In this paper, we propose a secure algorithm that plays with \emph{coded arms}. The algorithm does not require any key or cryptographic primitives, yet achieves an $Ω\left(\frac{T}{\log^2(d)}\right)$ exponent while revealing almost no information on the best arm.
ITDec 27, 2021
Universal Randomized Guessing Subjected to DistortionAsaf Cohen, Neri Merhav
In this paper, we consider the problem of guessing a sequence subject to a distortion constraint. Specifically, we assume the following game between Alice and Bob: Alice has a sequence $\bx$ of length $n$. Bob wishes to guess $\bx$, yet he is satisfied with finding any sequence $\hat{\bx}$ which is within a given distortion $D$ from $\bx$. Thus, he successively submits queries to Alice, until receiving an affirmative answer, stating that his guess was within the required distortion. Finding guessing strategies which minimize the number of guesses (the \emph{guesswork}), and analyzing its properties (e.g., its $ρ$--th moment) has several applications in information security, source and channel coding. Guessing subject to a distortion constraint is especially useful when considering contemporary biometrically--secured systems, where the "password" which protects the data is not a single, fixed vector but rather a \emph{ball of feature vectors} centered at some $\bx$, and any feature vector within the ball results in acceptance. We formally define the guessing problem under distortion in \emph{four different setups}: memoryless sources, guessing through a noisy channel, sources with memory and individual sequences. We suggest a randomized guessing strategy which is asymptotically optimal for all setups and is \emph{five--fold universal}, as it is independent of the source statistics, the channel, the moment to be optimized, the distortion measure and the distortion level.
IRNov 27, 2019
Learning a faceted customer segmentation for discovering new business opportunities at IntelItay Lieder, Meirav Segal, Eran Avidan et al.
For sales and marketing organizations within large enterprises, identifying and understanding new markets, customers and partners is a key challenge. Intel's Sales and Marketing Group (SMG) faces similar challenges while growing in new markets and domains and evolving its existing business. In today's complex technological and commercial landscape, there is need for intelligent automation supporting a fine-grained understanding of businesses in order to help SMG sift through millions of companies across many geographies and languages and identify relevant directions. We present a system developed in our company that mines millions of public business web pages, and extracts a faceted customer representation. We focus on two key customer aspects that are essential for finding relevant opportunities: industry segments (ranging from broad verticals such as healthcare, to more specific fields such as 'video analytics') and functional roles (e.g., 'manufacturer' or 'retail'). To address the challenge of labeled data collection, we enrich our data with external information gleaned from Wikipedia, and develop a semi-supervised multi-label, multi-lingual deep learning model that parses customer website texts and classifies them into their respective facets. Our system scans and indexes companies as part of a large-scale knowledge graph that currently holds tens of millions of connected entities with thousands being fetched, enriched and connected to the graph by the hour in real time, and also supports knowledge and insight discovery. In experiments conducted in our company, we are able to significantly boost the performance of sales personnel in the task of discovering new customers and commercial partnership opportunities.
ITMay 29, 2018
Why Botnets Work: Distributed Brute-Force Attacks Need No SynchronizationSalman Salamatian, Wasim Huleihel, Ahmad Beirami et al.
In September 2017, McAffee Labs quarterly report estimated that brute force attacks represent 20\% of total network attacks, making them the most prevalent type of attack ex-aequo with browser based vulnerabilities. These attacks have sometimes catastrophic consequences, and understanding their fundamental limits may play an important role in the risk assessment of password-secured systems, and in the design of better security protocols. While some solutions exist to prevent online brute-force attacks that arise from one single IP address, attacks performed by botnets are more challenging. In this paper, we analyze these distributed attacks by using a simplified model. Our aim is to understand the impact of distribution and asynchronization on the overall computational effort necessary to breach a system. Our result is based on Guesswork, a measure of the number of queries (guesses) required of an adversary before a correct sequence, such as a password, is found in an optimal attack. Guesswork is a direct surrogate for time and computational effort of guessing a sequence from a set of sequences with associated likelihoods. We model the lack of synchronization by a worst-case optimization in which the queries made by multiple adversarial agents are received in the worst possible order for the adversary, resulting in a min-max formulation. We show that, even without synchronization, and for sequences of growing length, the asymptotic optimal performance is achievable by using randomized guesses drawn from an appropriate distribution. Therefore, randomization is key for distributed asynchronous attacks. In other words, asynchronous guessers can asymptotically perform brute-force attacks as efficiently as synchronized guessers.
CRAug 15, 2015
Universal Anomaly Detection: Algorithms and ApplicationsShachar Siboni, Asaf Cohen
Modern computer threats are far more complicated than those seen in the past. They are constantly evolving, altering their appearance, perpetually changing disguise. Under such circumstances, detecting known threats, a fortiori zero-day attacks, requires new tools, which are able to capture the essence of their behavior, rather than some fixed signatures. In this work, we propose novel universal anomaly detection algorithms, which are able to learn the normal behavior of systems and alert for abnormalities, without any prior knowledge on the system model, nor any knowledge on the characteristics of the attack. The suggested method utilizes the Lempel-Ziv universal compression algorithm in order to optimally give probability assignments for normal behavior (during learning), then estimate the likelihood of new data (during operation) and classify it accordingly. The suggested technique is generic, and can be applied to different scenarios. Indeed, we apply it to key problems in computer security. The first is detecting Botnets Command and Control (C&C) channels. A Botnet is a logical network of compromised machines which are remotely controlled by an attacker using a C&C infrastructure, in order to perform malicious activities. We derive a detection algorithm based on timing data, which can be collected without deep inspection, from open as well as encrypted flows. We evaluate the algorithm on real-world network traces, showing how a universal, low complexity C&C identification system can be built, with high detection rates and low false-alarm probabilities. Further applications include malicious tools detection via system calls monitoring and data leakage identification.