Michael Green

h-index7

6papers

17citations

Novelty33%

AI Score37

Ranked #113,849 of 205,806 authors (top 55%)#36,431 in CV (top 62%)

6 Papers

35.2CVJun 2

VidMsg: A Benchmark for Implicit Message Inference in Short Videos

Issar Tzachor, Michael Green, Rami Ben-Ari

Understanding short online videos involves more than identifying visible objects and actions; video makers often include an underlying message or purpose in the clip. We introduce VidMsg, a benchmark for evaluating implicit message understanding in short, internet-native video clips. VidMsg contains 400 YouTube-derived clips across 9 practical topic areas and 52 fine-grained target messages, covering domains such as career and finance, education, health and well-being, culture, safety, sustainability, and lifestyle. VidMsg is constructed through a message-first pipeline: an LLM first translates target messages into indirect search scenarios, which are used to retrieve candidate clips. Human annotators then retain clips that convey the intended message without being overly explicit. VidMsg is designed primarily for bidirectional message-clip retrieval for scalable applications such as video search and recommendation, where systems must capture holistic video understanding. In addition to retrieval, VidMsg includes a diagnostic multiple-choice QA benchmark, where models select the intended message of a clip from semantically related alternatives. Experiments with contemporary video-language and retrieval models show that strong models often fail on VidMsg, because the task requires pragmatic inference, integration of contextual cues, and discrimination among semantically close messages. We also introduce VidVec-Msg, a baseline method that improves message-oriented retrieval while leaving substantial headroom for future work.

AISep 13, 2022Code

Borch: A Deep Universal Probabilistic Programming Language

Lewis Belcher, Johan Gudmundsson, Michael Green

Ever since the Multilayered Perceptron was first introduced the connectionist community has struggled with the concept of uncertainty and how this could be represented in these types of models. This past decade has seen a lot of effort in trying to join the principled approach of probabilistic modeling with the scalable nature of deep neural networks. While the theoretical benefits of this consolidation are clear, there are also several important practical aspects of these endeavors; namely to force the models we create to represent, learn, and report uncertainty in every prediction that is made. Many of these efforts have been based on extending existing frameworks with additional structures. We present Borch, a scalable deep universal probabilistic programming language, built on top of PyTorch. The code is available for download and use in our repository https://gitlab.com/desupervised/borch.

CVNov 8, 2022

Automatic Error Detection in Integrated Circuits Image Segmentation: A Data-driven Approach

Zhikang Zhang, Bruno Machado Trindade, Michael Green et al. · amazon-science

Due to the complicated nanoscale structures of current integrated circuits(IC) builds and low error tolerance of IC image segmentation tasks, most existing automated IC image segmentation approaches require human experts for visual inspection to ensure correctness, which is one of the major bottlenecks in large-scale industrial applications. In this paper, we present the first data-driven automatic error detection approach targeting two types of IC segmentation errors: wire errors and via errors. On an IC image dataset collected from real industry, we demonstrate that, by adapting existing CNN-based approaches of image classification and image translation with additional pre-processing and post-processing techniques, we are able to achieve recall/precision of 0.92/0.93 in wire error detection and 0.96/0.90 in via error detection, respectively.

CVMar 10, 2025

Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization

Michael Green, Matan Levy, Issar Tzachor et al.

We address the challenge of Small Object Image Retrieval (SoIR), where the goal is to retrieve images containing a specific small object, in a cluttered scene. The key challenge in this setting is constructing a single image descriptor, for scalable and efficient search, that effectively represents all objects in the image. In this paper, we first analyze the limitations of existing methods on this challenging task and then introduce new benchmarks to support SoIR evaluation. Next, we introduce Multi-object Attention Optimization (MaO), a novel retrieval framework which incorporates a dedicated multi-object pre-training phase. This is followed by a refinement process that leverages attention-based feature extraction with object masks, integrating them into a single unified image descriptor. Our MaO approach significantly outperforms existing retrieval methods and strong baselines, achieving notable improvements in both zero-shot and lightweight multi-object fine-tuning. We hope this work will lay the groundwork and inspire further research to enhance retrieval performance for this highly practical task. Code and Data are available on our project page: $\href{https://pihash2k.github.io/findyourneedle.github.io}{https://pihash2k.github.io/findyourneedle.github.io}$.

CLJan 13, 2022

Making a (Counterfactual) Difference One Rationale at a Time

Mitchell Plyler, Michael Green, Min Chi

Rationales, snippets of extracted text that explain an inference, have emerged as a popular framework for interpretable natural language processing (NLP). Rationale models typically consist of two cooperating modules: a selector and a classifier with the goal of maximizing the mutual information (MMI) between the "selected" text and the document label. Despite their promises, MMI-based methods often pick up on spurious text patterns and result in models with nonsensical behaviors. In this work, we investigate whether counterfactual data augmentation (CDA), without human assistance, can improve the performance of the selector by lowering the mutual information between spurious signals and the document label. Our counterfactuals are produced in an unsupervised fashion using class-dependent generative models. From an information theoretic lens, we derive properties of the unaugmented dataset for which our CDA approach would succeed. The effectiveness of CDA is empirically evaluated by comparing against several baselines including an improved MMI-based rationale schema on two multi aspect datasets. Our results show that CDA produces rationales that better capture the signal of interest.

IRAug 24, 2018

Can we leverage rating patterns from traditional users to enhance recommendations for children?

Ion Madrazo Azpiazu, Michael Green, Oghenemaro Anuyah et al.

Recommender algorithms performance is often associated with the availability of sufficient historical rating data. Unfortunately, when it comes to children, this data is seldom available. In this paper, we report on an initial analysis conducted to examine the degree to which data about traditional users, i.e., adults, can be leveraged to enhance the recommendation process for children.