Moshe Salhov

h-index10

12papers

79citations

Novelty40%

AI Score26

Ranked #163,304 of 194,257 authors (top 84%)#35,723 in LG (top 89%)

12 Papers

9.8LGJul 23, 2023

TabADM: Unsupervised Tabular Anomaly Detection with Diffusion Models

Guy Zamberg, Moshe Salhov, Ofir Lindenbaum et al.

Tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by utilizing a unique rejection scheme to attenuate the influence of anomalies on the density estimation. At inference, we identify anomalies as samples in low-density regions. We use real data to demonstrate that our method improves detection capabilities over baselines. Furthermore, our method is relatively stable to the dimension of the data and does not require extensive hyperparameter tuning.

2.7IVJun 9, 2022

Cross-boosting of WNNM Image Denoising method by Directional Wavelet Packets

Amir Averbuch, Pekka Neittaanmäki, Valery Zheludev et al.

The paper presents an image denoising scheme by combining a method that is based on directional quasi-analytic wavelet packets (qWPs) with the state-of-the-art Weighted Nuclear Norm Minimization (WNNM) denoising algorithm. The qWP-based denoising method (qWPdn) consists of multiscale qWP transform of the degraded image, application of adaptive localized soft thresholding to the transform coefficients using the Bivariate Shrinkage methodology, and restoration of the image from the thresholded coefficients from several decomposition levels. The combined method consists of several iterations of qWPdn and WNNM algorithms in a way that at each iteration the output from one algorithm boosts the input to the other. The proposed methodology couples the qWPdn capabilities to capture edges and fine texture patterns even in the severely corrupted images with utilizing the non-local self-similarity in real images that is inherent in the WNNM algorithm. Multiple experiments, which compared the proposed methodology with six advanced denoising algorithms, including WNNM, confirmed that the combined cross-boosting algorithm outperforms most of them in terms of both quantitative measure and visual perception quality.

1.8LGApr 19, 2022

Imbalanced Classification via a Tabular Translation GAN

Jonathan Gradstein, Moshe Salhov, Yoav Tulpan et al.

When presented with a binary classification problem where the data exhibits severe class imbalance, most standard predictive methods may fail to accurately model the minority class. We present a model based on Generative Adversarial Networks which uses additional regularization losses to map majority samples to corresponding synthetic minority samples. This translation mechanism encourages the synthesized samples to be close to the class boundary. Furthermore, we explore a selection criterion to retain the most useful of the synthesized samples. Experimental results using several downstream classifiers on a variety of tabular class-imbalanced datasets show that the proposed method improves average precision when compared to alternative re-weighting and oversampling techniques.

3.6IVMay 20, 2024

SEL-CIE: Knowledge-Guided Self-Supervised Learning Framework for CIE-XYZ Reconstruction from Non-Linear sRGB Images

Shir Barzel, Moshe Salhov, Ofir Lindenbaum et al.

Modern cameras typically offer two types of image states: a minimally processed linear raw RGB image representing the raw sensor data, and a highly-processed non-linear image state, such as the sRGB state. The CIE-XYZ color space is a device-independent linear space used as part of the camera pipeline and can be helpful for computer vision tasks, such as image deblurring, dehazing, and color recognition tasks in medical applications, where color accuracy is important. However, images are usually saved in non-linear states, and achieving CIE-XYZ color images using conventional methods is not always possible. To tackle this issue, classical methodologies have been developed that focus on reversing the acquisition pipeline. More recently, supervised learning has been employed, using paired CIE-XYZ and sRGB representations of identical images. However, obtaining a large-scale dataset of CIE-XYZ and sRGB pairs can be challenging. To overcome this limitation and mitigate the reliance on large amounts of paired data, self-supervised learning (SSL) can be utilized as a substitute for relying solely on paired data. This paper proposes a framework for using SSL methods alongside paired data to reconstruct CIE-XYZ images and re-render sRGB images, outperforming existing approaches. The proposed framework is applied to the sRGB2XYZ dataset.

2.5AIJan 17, 2022Code

Search and Score-based Waterfall Auction Optimization

Dan Halbersberg, Matan Halevi, Moshe Salhov

Online advertising is a major source of income for many online companies. One common approach is to sell online advertisements via waterfall auctions, through which a publisher makes sequential price offers to ad networks. The publisher controls the order and prices of the waterfall in an attempt to maximize his revenue. In this work, we propose a methodology to learn a waterfall strategy from historical data by wisely searching in the space of possible waterfalls and selecting the one leading to the highest revenues. The contribution of this work is twofold; First, we propose a novel method to estimate the valuation distribution of each user, with respect to each ad network. Second, we utilize the valuation matrix to score our candidate waterfalls as part of a procedure that iteratively searches in local neighborhoods. Our framework guarantees that the waterfall revenue improves between iterations ultimately converging into a local optimum. Real-world demonstrations are provided to show that the proposed method improves the total revenue of real-world waterfalls, as compared to manual expert optimization. Finally, the code and the data are available here.

3.3LGOct 12, 2020

$\ell_0$-based Sparse Canonical Correlation Analysis

Ofir Lindenbaum, Moshe Salhov, Amir Averbuch et al.

Canonical Correlation Analysis (CCA) models are powerful for studying the associations between two sets of variables. The canonically correlated representations, termed \textit{canonical variates} are widely used in unsupervised learning to analyze unlabeled multi-modal registered datasets. Despite their success, CCA models may break (or overfit) if the number of variables in either of the modalities exceeds the number of samples. Moreover, often a significant fraction of the variables measures modality-specific information, and thus removing them is beneficial for identifying the \textit{canonically correlated variates}. Here, we propose $\ell_0$-CCA, a method for learning correlated representations based on sparse subsets of variables from two observed modalities. Sparsity is obtained by multiplying the input variables by stochastic gates, whose parameters are learned together with the CCA weights via an $\ell_0$-regularized correlation loss. We further propose $\ell_0$-Deep CCA for solving the problem of non-linear sparse CCA by modeling the correlated representations using deep nets. We demonstrate the efficacy of the method using several synthetic and real examples. Most notably, by gating nuisance input variables, our approach improves the extracted representations compared to other linear, non-linear and sparse CCA-based models.

4.2LGFeb 8, 2020

Majority Voting and the Condorcet's Jury Theorem

Hanan Shteingart, Eran Marom, Igor Itkin et al.

There is a striking relationship between a three hundred years old Political Science theorem named "Condorcet's jury theorem" (1785), which states that majorities are more likely to choose correctly when individual votes are often correct and independent, and a modern Machine Learning concept called "Strength of Weak Learnability" (1990), which describes a method for converting a weak learning algorithm into one that achieves arbitrarily high accuracy and stands in the basis of Ensemble Learning. Albeit the intuitive statement of Condorcet's theorem, we could not find a compact and simple rigorous mathematical proof of the theorem neither in classical handbooks of Machine Learning nor in published papers. By all means we do not claim to discover or reinvent a theory nor a result. We humbly want to offer a more publicly available simple derivation of the theorem. We will find joy in seeing more teachers of introduction-to-machine-learning courses use the proof we provide here as an exercise to explain the motivation of ensemble learning.

3.7LGJul 4, 2017

Kernel Scaling for Manifold Learning and Classification

Ofir Lindenbaum, Moshe Salhov, Arie Yeredor et al.

Kernel methods play a critical role in many machine learning algorithms. They are useful in manifold learning, classification, clustering and other data analysis tasks. Setting the kernel's scale parameter, also referred to as the kernel's bandwidth, highly affects the performance of the task in hand. We propose to set a scale parameter that is tailored to one of two types of tasks: classification and manifold learning. For manifold learning, we seek a scale which is best at capturing the manifold's intrinsic dimension. For classification, we propose three methods for estimating the scale, which optimize the classification results in different senses. The proposed frameworks are simulated on artificial and on real datasets. The results show a high correlation between optimal classification rates and the estimated scales. Finally, we demonstrate the approach on a seismic event classification task.

1.0LGJul 12, 2016

Incomplete Pivoted QR-based Dimensionality Reduction

Amit Bermanis, Aviv Rotbart, Moshe Salhov et al.

High-dimensional big data appears in many research fields such as image recognition, biology and collaborative filtering. Often, the exploration of such data by classic algorithms is encountered with difficulties due to `curse of dimensionality' phenomenon. Therefore, dimensionality reduction methods are applied to the data prior to its analysis. Many of these methods are based on principal components analysis, which is statistically driven, namely they map the data into a low-dimension subspace that preserves significant statistical properties of the high-dimensional data. As a consequence, such methods do not directly address the geometry of the data, reflected by the mutual distances between multidimensional data point. Thus, operations such as classification, anomaly detection or other machine learning tasks may be affected. This work provides a dictionary-based framework for geometrically driven data analysis that includes dimensionality reduction, out-of-sample extension and anomaly detection. It embeds high-dimensional data in a low-dimensional subspace. This embedding preserves the original high-dimensional geometry of the data up to a user-defined distortion rate. In addition, it identifies a subset of landmark data points that constitute a dictionary for the analyzed dataset. The dictionary enables to have a natural extension of the low-dimensional embedding to out-of-sample data points, which gives rise to a distortion-based criterion for anomaly detection. The suggested method is demonstrated on synthetic and real-world datasets and achieves good results for classification, anomaly detection and out-of-sample tasks.

7.9LGJun 28, 2016

Multi-View Kernel Consensus For Data Analysis

Moshe Salhov, Ofir Lindenbaum, Yariv Aizenbud et al.

The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phenomena into the measured high dimensional observations might distort the distance metric, This distortion can effect the desired estimated low dimensional geometric structure. In this paper, we suggest to utilize the redundancy in the attribute domain by partitioning the attributes into multiple subsets we call views. The proposed methods utilize the agreement also called consensus between different views to extract valuable geometric information that unifies multiple views about the intrinsic relationships among several different observations. This unification enhances the information that a single view or a simple concatenations of views provides.

2.8MLNov 19, 2015

Diffusion Representations

Moshe Salhov, Amit Bermanis, Guy Wolf et al.

Diffusion Maps framework is a kernel based method for manifold learning and data analysis that defines diffusion similarities by imposing a Markovian process on the given dataset. Analysis by this process uncovers the intrinsic geometric structures in the data. Recently, it was suggested to replace the standard kernel by a measure-based kernel that incorporates information about the density of the data. Thus, the manifold assumption is replaced by a more general measure-based assumption. The measure-based diffusion kernel incorporates two separate independent representations. The first determines a measure that correlates with a density that represents normal behaviors and patterns in the data. The second consists of the analyzed multidimensional data points. In this paper, we present a representation framework for data analysis of datasets that is based on a closed-form decomposition of the measure-based kernel. The proposed representation preserves pairwise diffusion distances that does not depend on the data size while being invariant to scale. For a stationary data, no out-of-sample extension is needed for embedding newly arrived data points in the representation space. Several aspects of the presented methodology are demonstrated on analytically generated data.

10.5LGAug 23, 2015

MultiView Diffusion Maps

Ofir Lindenbaum, Arie Yeredor, Moshe Salhov et al.

In this paper, we address the challenging task of achieving multi-view dimensionality reduction. The goal is to effectively use the availability of multiple views for extracting a coherent low-dimensional representation of the data. The proposed method exploits the intrinsic relation within each view, as well as the mutual relations between views. The multi-view dimensionality reduction is achieved by defining a cross-view model in which an implied random walk process is restrained to hop between objects in the different views. The method is robust to scaling and insensitive to small structural changes in the data. We define new diffusion distances and analyze the spectra of the proposed kernel. We show that the proposed framework is useful for various machine learning applications such as clustering, classification, and manifold learning. Finally, by fusing multi-sensor seismic data we present a method for automatic identification of seismic events.