Michael Werman

h-index1

29papers

230citations

Novelty43%

AI Score44

Ranked #73,297 of 201,326 authors (top 36%)#25,653 in CV (top 44%)

29 Papers

25.9CLMay 28

COMPOSE: Composing Future Theorems from Citations and Formal Structure

David Busbib, Michael Werman

A plausible future mathematical claim must satisfy two constraints: it should follow the direction of prior work and respect the formal dependencies that constrain what can validly follow. Existing approaches typically model only one of these sources, producing claims that are either weakly grounded or insufficiently motivated. We introduce grounded future mathematical generation, where the goal is to generate a plausible future theorem-like claim for an anchor paper using two complementary sources of context: its scientific citation graph and aligned formal theorem dependency graph. To address this setting, we propose COMPOSE, a dual-graph framework that conditions a language model on both scientific citation context and formal theorem structure. To support this setting, we construct a dataset of 108K paired scientific-formal graph examples from arXiv and Mathlib, together with a benchmark of 47K future papers from 2024--2025. Experiments show that COMPOSE outperforms strong baselines on retrieval to real future papers and achieves the best overall performance under LLM-judge evaluation, producing more grounded and mathematically richer outputs. These results show that future mathematical generation benefits from combining scientific context with formal structure. Project page is available at https://david-busbib.github.io/COMPOSE-page/.

CVDec 10, 2022

An approach to robust ICP initialization

Alexander Kolpakov, Michael Werman

In this note, we propose an approach to initialize the Iterative Closest Point (ICP) algorithm to match unlabelled point clouds related by rigid transformations. The method is based on matching the ellipsoids defined by the points' covariance matrices and then testing the various principal half-axes matchings that differ by elements of a finite reflection group. We derive bounds on the robustness of our approach to noise and numerical experiments confirm our theoretical findings.

CVMar 5, 2023

Robust affine point matching via quadratic assignment on Grassmannians

Alexander Kolpakov, Michael Werman

Robust Affine Matching with Grassmannians (RoAM) is a new algorithm to perform affine registration of point clouds. The algorithm is based on minimizing the Frobenius distance between two elements of the Grassmannian. For this purpose, an indefinite relaxation of the Quadratic Assignment Problem (QAP) is used, and several approaches to affine feature matching are studied and compared. Experiments demonstrate that RoAM is more robust to noise and point discrepancy than previous methods.

CVJul 3, 2022

DecisioNet: A Binary-Tree Structured Neural Network

Noam Gottlieb, Michael Werman

Deep neural networks (DNNs) and decision trees (DTs) are both state-of-the-art classifiers. DNNs perform well due to their representational learning capabilities, while DTs are computationally efficient as they perform inference along one route (root-to-leaf) that is dependent on the input data. In this paper, we present DecisioNet (DN), a binary-tree structured neural network. We propose a systematic way to convert an existing DNN into a DN to create a lightweight version of the original model. DecisioNet takes the best of both worlds - it uses neural modules to perform representational learning and utilizes its tree structure to perform only a portion of the computations. We evaluate various DN architectures, along with their corresponding baseline models on the FashionMNIST, CIFAR10, and CIFAR100 datasets. We show that the DN variants achieve similar accuracy while significantly reducing the computational cost of the original network.

NEMar 20, 2022

Fully Convolutional Fractional Scaling

Michael Soloveitchik, Michael Werman

We introduce a fully convolutional fractional scaling component, FCFS. Fully convolutional networks can be applied to any size input and previously did not support non-integer scaling. Our architecture is simple with an efficient single layer implementation. Examples and code implementations of three common scaling methods are published.

8.1LGApr 4

Neural Global Optimization via Iterative Refinement from Noisy Samples

Qusay Muzaffar, David Levin, Michael Werman

Global optimization of black-box functions from noisy samples is a fundamental challenge in machine learning and scientific computing. Traditional methods such as Bayesian Optimization often converge to local minima on multi-modal functions, while gradient-free methods require many function evaluations. We present a novel neural approach that learns to find global minima through iterative refinement. Our model takes noisy function samples and their fitted spline representation as input, then iteratively refines an initial guess toward the true global minimum. Trained on randomly generated functions with ground truth global minima obtained via exhaustive search, our method achieves a mean error of 8.05 percent on challenging multi-modal test functions, compared to 36.24 percent for the spline initialization, a 28.18 percent improvement. The model successfully finds global minima in 72 percent of test cases with error below 10 percent, demonstrating learned optimization principles rather than mere curve fitting. Our architecture combines encoding of multiple modalities including function values, derivatives, and spline coefficients with iterative position updates, enabling robust global optimization without requiring derivative information or multiple restarts.

CVOct 3, 2023

Beyond the Benchmark: Detecting Diverse Anomalies in Videos

Yoav Arad, Michael Werman

Video Anomaly Detection (VAD) plays a crucial role in modern surveillance systems, aiming to identify various anomalies in real-world situations. However, current benchmark datasets predominantly emphasize simple, single-frame anomalies such as novel object detection. This narrow focus restricts the advancement of VAD models. In this research, we advocate for an expansion of VAD investigations to encompass intricate anomalies that extend beyond conventional benchmark boundaries. To facilitate this, we introduce two datasets, HMDB-AD and HMDB-Violence, to challenge models with diverse action-based anomalies. These datasets are derived from the HMDB51 action recognition dataset. We further present Multi-Frame Anomaly Detection (MFAD), a novel method built upon the AI-VAD framework. AI-VAD utilizes single-frame features such as pose estimation and deep image encoding, and two-frame features such as object velocity. They then apply a density estimation algorithm to compute anomaly scores. To address complex multi-frame anomalies, we add a deep video encoding features capturing long-range temporal dependencies, and logistic regression to enhance final score calculation. Experimental results confirm our assumptions, highlighting existing models limitations with new anomaly types. MFAD excels in both simple and complex anomaly detection scenarios.

CVApr 3, 2025

CanonNet: Canonical Ordering and Curvature Learning for Point Cloud Analysis

Benjy Friedmann, Michael Werman

Point cloud processing poses two fundamental challenges: establishing consistent point ordering and effectively learning fine-grained geometric features. Current architectures rely on complex operations that limit expressivity while struggling to capture detailed surface geometry. We present CanonNet, a lightweight neural network composed of two complementary components: (1) a preprocessing pipeline that creates a canonical point ordering and orientation, and (2) a geometric learning framework where networks learn from synthetic surfaces with precise curvature values. This modular approach eliminates the need for complex transformation-invariant architectures while effectively capturing local geometric properties. Our experiments demonstrate state-of-the-art performance in curvature estimation and competitive results in geometric descriptor tasks with significantly fewer parameters (\textbf{100X}) than comparable methods. CanonNet's efficiency makes it particularly suitable for real-world applications where computational resources are limited, demonstrating that mathematical preprocessing can effectively complement neural architectures for point cloud analysis. The code for the project is publicly available \hyperlink{https://benjyfri.github.io/CanonNet/}{https://benjyfri.github.io/CanonNet/}.

LGNov 7, 2024

The Fibonacci Network: A Simple Alternative for Positional Encoding

Yair Bleiberg, Michael Werman

Coordinate-based Multi-Layer Perceptrons (MLPs) are known to have difficulty reconstructing high frequencies of the training data. A common solution to this problem is Positional Encoding (PE), which has become quite popular. However, PE has drawbacks. It has high-frequency artifacts and adds another hyper-hyperparameter, just like batch normalization and dropout do. We believe that under certain circumstances PE is not necessary, and a smarter construction of the network architecture together with a smart training method is sufficient to achieve similar results. In this paper, we show that very simple MLPs can quite easily output a frequency when given input of the half-frequency and quarter-frequency. Using this, we design a network architecture in blocks, where the input to each block is the output of the two previous blocks along with the original input. We call this a {\it Fibonacci Network}. By training each block on the corresponding frequencies of the signal, we show that Fibonacci Networks can reconstruct arbitrarily high frequencies.

CVSep 24, 2024

Camera Calibration and Stereo via a Single Image of a Spherical Mirror

Nissim Barzilay, Ofek Narinsky, Michael Werman

This paper presents a novel technique for camera calibration using a single view that incorporates a spherical mirror. Leveraging the distinct characteristics of the sphere's contour visible in the image and its reflections, we showcase the effectiveness of our method in achieving precise calibration. Furthermore, the reflection from the mirrored surface provides additional information about the surrounding scene beyond the image frame. Our method paves the way for the development of simple catadioptric stereo systems. We explore the challenges and opportunities associated with employing a single mirrored sphere, highlighting the potential applications of this setup in practical scenarios. The paper delves into the intricacies of the geometry and calibration procedures involved in catadioptric stereo utilizing a spherical mirror. Experimental results, encompassing both synthetic and real-world data, are presented to illustrate the feasibility and accuracy of our approach.

CVMar 24, 2021

On a realization of motion and similarity group equivalence classes of labeled points in $\mathbb R^k$ with applications to computer vision

Steven B. Damelin, David L. Ragozin, Michael Werman

We study a realization of motion and similarity group equivalence classes of $n\geq 1$ labeled points in $\mathbb R^k,\, k\geq 1$ as a metric space with a computable metric. Our study is motivated by applications in computer vision.

CVOct 26, 2020

Using a Supervised Method without supervision for foreground segmentation

Levi Kassel, Michael Werman

Neural networks are a powerful framework for foreground segmentation in video acquired by static cameras, segmenting moving objects from the background in a robust way in various challenging scenarios. The premier methods are those based on supervision requiring a final training stage on a database of tens to hundreds of manually segmented images from the specific static camera. In this work, we propose a method to automatically create an "artificial" database that is sufficient for training the supervised methods so that it performs better than current unsupervised methods. It is based on combining a weak foreground segmenter, compared to the supervised method, to extract suitable objects from the training images and randomly inserting these objects back into a background image. Test results are shown on the test sequences in CDnet.

CVNov 28, 2019

Cameras Viewing Cameras Geometry

Danail Brezov, Michael Werman

A basic problem in computer vision is to understand the structure of a real-world scene given several images of it. Here we study several theoretical aspects of the intra multi-view geometry of calibrated cameras when all that they can reliably recognize is each other. With the proliferation of wearable cameras, autonomous vehicles and drones, the geometry of these multiple cameras is a timely and relevant problem to study.

CVMar 6, 2019

Clear Skies Ahead: Towards Real-Time Automatic Sky Replacement in Video

Tavi Halperin, Harel Cain, Ofir Bibi et al.

Digital videos such as those captured by a smartphone often exhibit exposure inconsistencies, a poorly exposed sky, or simply suffer from an uninteresting or plain looking sky. Professionals may edit these videos using advanced and time-consuming tools unavailable to most users, to replace the sky with a more expressive or imaginative sky. In this work, we propose an algorithm for automatic replacement of the sky region in a video with a different sky, providing nonprofessional users with a simple yet efficient tool to seamlessly replace the sky. The method is fast, achieving close to real-time performance on mobile devices and the user's involvement can remain as limited as simply selecting the replacement sky.

CVDec 21, 2018

Detection of distal radius fractures trained by a small set of X-ray images and Faster R-CNN

Erez Yahalomi, Michael Chernofsky, Michael Werman

Distal radius fractures are the most common fractures of the upper extremity in humans. As such, they account for a significant portion of the injuries that present to emergency rooms and clinics throughout the world. We trained a Faster R-CNN, a machine vision neural network for object detection, to identify and locate distal radius fractures in anteroposterior X-ray images. We achieved an accuracy of 96\% in identifying fractures and mean Average Precision, mAP, of 0.866. This is significantly more accurate than the detection achieved by physicians and radiologists. These results were obtained by training the deep learning network with only 38 original images of anteroposterior hands X-ray images with fractures. This opens the possibility to detect with this type of neural network rare diseases or rare symptoms of common diseases , where only a small set of diagnosed X-ray images could be collected for each disease.

OCDec 5, 2018

On Min-Max affine approximants of convex or concave real valued functions from $\mathbb R^k$, Chebyshev equioscillation and graphics

Steven B. Damelin, David L. Ragozin, Michael Werman

We study Min-Max affine approximants of a continuous convex or concave function $f:Δ\subset \mathbb R^k\xrightarrow{} \mathbb R$ where $Δ$ is a convex compact subset of $\mathbb R^k$. In the case when $Δ$ is a simplex we prove that there is a vertical translate of the supporting hyperplane in $\mathbb R^{k+1}$ of the graph of $f$ at the vertices which is the unique best affine approximant to $f$ on $Δ$. For $k=1$, this result provides an extension of the Chebyshev equioscillation theorem for linear approximants. Our result has interesting connections to the computer graphics problem of rapid rendering of projective transformations.

CVNov 15, 2018

Sketch based Reduced Memory Hough Transform

Levi Offen, Michael Werman

This paper proposes using sketch algorithms to represent the votes in Hough transforms. Replacing the accumulator array with a sketch (Sketch Hough Transform - SHT) significantly reduces the memory needed to compute a Hough transform. We also present a new sketch, Count Median Update, which works better than known sketch methods for replacing the accumulator array in the Hough Transform.

CVNov 15, 2018

Image declipping with deep networks

Shachar Honig, Michael Werman

We present a deep network to recover pixel values lost to clipping. The clipped area of the image is typically a uniform area of minimum or maximum brightness, losing image detail and color fidelity. The degree to which the clipping is visually noticeable depends on the amount by which values were clipped, and the extent of the clipped area. Clipping may occur in any (or all) of the pixel's color channels. Although clipped pixels are common and occur to some degree in almost every image we tested, current automatic solutions have only partial success in repairing clipped pixels and work only in limited cases such as only with overexposure (not under-exposure) and when some of the color channels are not clipped. Using neural networks and their ability to model natural images allows our neural network, DeclipNet, to reconstruct data in clipped regions producing state of the art results.

CVOct 22, 2018

Two view constraints on the epipoles from few correspondences

Yoni Kasten, Michael Werman

In general it requires at least 7 point correspondences to compute the fundamental matrix between views. We use the cross ratio invariance between corresponding epipolar lines, stemming from epipolar line homography, to derive a simple formulation for the relationship between epipoles and corresponding points. We show how it can be used to reduce the number of required points for the epipolar geometry when some information about the epipoles is available and demonstrate this with a buddy search app.

LGSep 29, 2017

IQ of Neural Networks

Dokhyam Hoshen, Michael Werman

IQ tests are an accepted method for assessing human intelligence. The tests consist of several parts that must be solved under a time constraint. Of all the tested abilities, pattern recognition has been found to have the highest correlation with general intelligence. This is primarily because pattern recognition is the ability to find order in a noisy environment, a necessary skill for intelligent agents. In this paper, we propose a convolutional neural network (CNN) model for solving geometric pattern recognition problems. The CNN receives as input multiple ordered input images and outputs the next image according to the pattern. Our CNN is able to solve problems involving rotation, reflection, color, size and shape patterns and score within the top 5% of human performance.

CVMar 28, 2017

An Epipolar Line from a Single Pixel

Tavi Halperin, Michael Werman

Computing the epipolar geometry from feature points between cameras with very different viewpoints is often error prone, as an object's appearance can vary greatly between images. For such cases, it has been shown that using motion extracted from video can achieve much better results than using a static image. This paper extends these earlier works based on the scene dynamics. In this paper we propose a new method to compute the epipolar geometry from a video stream, by exploiting the following observation: For a pixel p in Image A, all pixels corresponding to p in Image B are on the same epipolar line. Equivalently, the image of the line going through camera A's center and p is an epipolar line in B. Therefore, when cameras A and B are synchronized, the momentary images of two objects projecting to the same pixel, p, in camera A at times t1 and t2, lie on an epipolar line in camera B. Based on this observation we achieve fast and precise computation of epipolar lines. Calibrating cameras based on our method of finding epipolar lines is much faster and more robust than previous methods.

CVSep 17, 2016

A convolutional approach to reflection symmetry

Marcelo Cicconet, Vighnesh Birodkar, Mads Lund et al.

We present a convolutional approach to reflection symmetry detection in 2D. Our model, built on the products of complex-valued wavelet convolutions, simplifies previous edge-based pairwise methods. Being parameter-centered, as opposed to feature-centered, it has certain computational advantages when the object sizes are known a priori, as demonstrated in an ellipse detection application. The method outperforms the best-performing algorithm on the CVPR 2013 Symmetry Detection Competition Database in the single-symmetry case. Code and a new database for 2D symmetry detection is available.

CVJul 26, 2016

Fundamental Matrices from Moving Objects Using Line Motion Barcodes

Yoni Kasten, Gil Ben-Artzi, Shmuel Peleg et al.

Computing the epipolar geometry between cameras with very different viewpoints is often very difficult. The appearance of objects can vary greatly, and it is difficult to find corresponding feature points. Prior methods searched for corresponding epipolar lines using points on the convex hull of the silhouette of a single moving object. These methods fail when the scene includes multiple moving objects. This paper extends previous work to scenes having multiple moving objects by using the "Motion Barcodes", a temporal signature of lines. Corresponding epipolar lines have similar motion barcodes, and candidate pairs of corresponding epipoar lines are found by the similarity of their motion barcodes. As in previous methods we assume that cameras are relatively stationary and that moving objects have already been extracted using background subtraction.

CVApr 17, 2016

Epipolar Geometry Based On Line Similarity

Gil Ben-Artzi, Tavi Halperin, Michael Werman et al.

It is known that epipolar geometry can be computed from three epipolar line correspondences but this computation is rarely used in practice since there are no simple methods to find corresponding lines. Instead, methods for finding corresponding points are widely used. This paper proposes a similarity measure between lines that indicates whether two lines are corresponding epipolar lines and enables finding epipolar line correspondences as needed for the computation of epipolar geometry. A similarity measure between two lines, suitable for video sequences of a dynamic scene, has been previously described. This paper suggests a stereo matching similarity measure suitable for images. It is based on the quality of stereo matching between the two lines, as corresponding epipolar lines yield a good stereo correspondence. Instead of an exhaustive search over all possible pairs of lines, the search space is substantially reduced when two corresponding point pairs are given. We validate the proposed method using real-world images and compare it to state-of-the-art methods. We found this method to be more accurate by a factor of five compared to the standard method using seven corresponding points and comparable to the 8-points algorithm.

CVJun 25, 2015

Camera Calibration from Dynamic Silhouettes Using Motion Barcodes

Gil Ben-Artzi, Yoni Kasten, Shmuel Peleg et al.

Computing the epipolar geometry between cameras with very different viewpoints is often problematic as matching points are hard to find. In these cases, it has been proposed to use information from dynamic objects in the scene for suggesting point and line correspondences. We propose a speed up of about two orders of magnitude, as well as an increase in robustness and accuracy, to methods computing epipolar geometry from dynamic silhouettes. This improvement is based on a new temporal signature: motion barcode for lines. Motion barcode is a binary temporal sequence for lines, indicating for each frame the existence of at least one foreground pixel on that line. The motion barcodes of two corresponding epipolar lines are very similar, so the search for corresponding epipolar lines can be limited only to lines having similar barcodes. The use of motion barcodes leads to increased speed, accuracy, and robustness in computing the epipolar geometry.

CVMay 29, 2015

General Deformations of Point Configurations Viewed By a Pinhole Model Camera

Yirmeyahu Kaminski, Michael Werman

This paper is a theoretical study of the following Non-Rigid Structure from Motion problem. What can be computed from a monocular view of a parametrically deforming set of points? We treat various variations of this problem for affine and polynomial deformations with calibrated and uncalibrated cameras. We show that in general at least three images with quasi-identical two deformations are needed in order to have a finite set of solutions of the points' structure and calculate some simple examples.

CVFeb 2, 2015

Quantum Pairwise Symmetry: Applications in 2D Shape Analysis

Marcelo Cicconet, Davi Geiger, Michael Werman

A pair of rooted tangents -- defining a quantum triangle -- with an associated quantum wave of spin 1/2 is proposed as the primitive to represent and compute symmetry. Measures of the spin characterize how "isosceles" or how "degenerate" these triangles are -- which corresponds to their mirror or parallel symmetry. We also introduce a complex-valued kernel to model probability errors in the parameter space, which is more robust to noise and clutter than the classical model.

CVFeb 2, 2015

Complex-Valued Hough Transforms for Circles

Marcelo Cicconet, Davi Geiger, Michael Werman

This paper advocates the use of complex variables to represent votes in the Hough transform for circle detection. Replacing the positive numbers classically used in the parameter space of the Hough transforms by complex numbers allows cancellation effects when adding up the votes. Cancellation and the computation of shape likelihood via a complex number's magnitude square lead to more robust solutions than the "classic" algorithms, as shown by computational experiments on synthetic and real datasets.

CVDec 3, 2014

Event Retrieval Using Motion Barcodes

Gil Ben-Artzi, Michael Werman, Shmuel Peleg

We introduce a simple and effective method for retrieval of videos showing a specific event, even when the videos of that event were captured from significantly different viewpoints. Appearance-based methods fail in such cases, as appearances change with large changes of viewpoints. Our method is based on a pixel-based feature, "motion barcode", which records the existence/non-existence of motion as a function of time. While appearance, motion magnitude, and motion direction can vary greatly between disparate viewpoints, the existence of motion is viewpoint invariant. Based on the motion barcode, a similarity measure is developed for videos of the same event taken from very different viewpoints. This measure is robust to occlusions common under different viewpoints, and can be computed efficiently. Event retrieval is demonstrated using challenging videos from stationary and hand held cameras.