AIJul 5, 2024Code
Smart Vision-Language ReasonersDenisa Roberts, Lucas Roberts
In this article, we investigate vision-language models (VLM) as reasoners. The ability to form abstractions underlies mathematical reasoning, problem-solving, and other Math AI tasks. Several formalisms have been given to these underlying abstractions and skills utilized by humans and intelligent systems for reasoning. Furthermore, human reasoning is inherently multimodal, and as such, we focus our investigations on multimodal AI. In this article, we employ the abstractions given in the SMART task (Simple Multimodal Algorithmic Reasoning Task) introduced in \cite{cherian2022deep} as meta-reasoning and problem-solving skills along eight axes: math, counting, path, measure, logic, spatial, and pattern. We investigate the ability of vision-language models to reason along these axes and seek avenues of improvement. Including composite representations with vision-language cross-attention enabled learning multimodal representations adaptively from fused frozen pretrained backbones for better visual grounding. Furthermore, proper hyperparameter and other training choices led to strong improvements (up to $48\%$ gain in accuracy) on the SMART task, further underscoring the power of deep multimodal learning. The smartest VLM, which includes a novel QF multimodal layer, improves upon the best previous baselines in every one of the eight fundamental reasoning skills. End-to-end code is available at https://github.com/smarter-vlm/smarter.
LGFeb 2, 2023
adSformers: Personalization from Short-Term Sequences and Diversity of Representations in Etsy AdsAlaa Awad, Denisa Roberts, Eden Dolev et al.
In this article, we present a general approach to personalizing ads through encoding and learning from variable-length sequences of recent user actions and diverse representations. To this end we introduce a three-component module called the adSformer diversifiable personalization module (ADPM) that learns a dynamic user representation. We illustrate the module's effectiveness and flexibility by personalizing the Click-Through Rate (CTR) and Post-Click Conversion Rate (PCCVR) models used in sponsored search. The first component of the ADPM, the adSformer encoder, includes a novel adSformer block which learns the most salient sequence signals. ADPM's second component enriches the learned signal through visual, multimodal, and other pretrained representations. Lastly, the third ADPM "learned on the fly" component further diversifies the signal encoded in the dynamic user representation. The ADPM-personalized CTR and PCCVR models, henceforth referred to as adSformer CTR and adSformer PCCVR, outperform the CTR and PCCVR production baselines by $+2.66\%$ and $+2.42\%$, respectively, in offline Area Under the Receiver Operating Characteristic Curve (ROC-AUC). Following the robust online gains in A/B tests, Etsy Ads deployed the ADPM-personalized sponsored search system to $100\%$ of traffic as of February 2023.
SESep 30, 2025Code
Which Programming Language and Model Work Best With LLM-as-a-Judge For Code Retrieval?Lucas Roberts, Denisa Roberts
Code search is an important information retrieval application. Benefits of better code search include faster new developer on-boarding, reduced software maintenance, and ease of understanding for large repositories. Despite improvements in search algorithms and search benchmarks, the domain of code search has lagged behind. One reason is the high cost of human annotation for code queries and answers. While humans may annotate search results in general text QA systems, code annotations require specialized knowledge of a programming language (PL), as well as domain specific software engineering knowledge. In this work we study the use of Large Language Models (LLMs) to retrieve code at the level of functions and to generate annotations for code search results. We compare the impact of the retriever representation (sparse vs. semantic), programming language, and LLM by comparing human annotations across several popular languages (C, Java, Javascript, Go, and Python). We focus on repositories that implement common data structures likely to be implemented in any PLs. For the same human annotations, we compare several LLM-as-a-Judge models to evaluate programming language and other affinities between LLMs. We find that the chosen retriever and PL exhibit affinities that can be leveraged to improve alignment of human and AI relevance determinations, with significant performance implications. We also find differences in representation (sparse vs. semantic) across PLs that impact alignment of human and AI relevance determinations. We propose using transpilers to bootstrap scalable code search benchmark datasets in other PLs and in a case study demonstrate that human-AI relevance agreement rates largely match the (worst case) human-human agreement under study. The application code used in this work is available at \href{https://github.com/rlucas7/code-searcher/}{this github repo}.
CVMay 22, 2023
Efficient Large-Scale Visual Representation Learning And EvaluationEden Dolev, Alaa Awad, Denisa Roberts et al.
Efficiently learning visual representations of items is vital for large-scale recommendations. In this article we compare several pretrained efficient backbone architectures, both in the convolutional neural network (CNN) and in the vision transformer (ViT) family. We describe challenges in e-commerce vision applications at scale and highlight methods to efficiently train, evaluate, and serve visual representations. We present ablation studies evaluating visual representations in several downstream tasks. To this end, we present a novel multilingual text-to-image generative offline evaluation method for visually similar recommendation systems. Finally, we include online results from deployed machine learning systems in production on a large scale e-commerce platform.
LGMar 18, 2019
Neural Networks for Lorenz Map Prediction: A Trip Through TimeDenisa Roberts
In this article the Lorenz dynamical system is revived and revisited and the current state of the art results for one step ahead forecasting for the Lorenz trajectories are published. Multitask learning is shown to help learning the hard to learn z trajectory. The article is a reflection upon the evolution of neural networks with respect to the prediction performance on this canonical task.
STJan 20, 2018
A Second Order Cumulant Spectrum Test That a Stochastic Process is Strictly Stationary and a Step Toward a Test for Graph Signal Strict StationarityDenisa Roberts, Douglas Patterson
This article develops a statistical test for the null hypothesis of strict stationarity of a discrete time stochastic process in the frequency domain. When the null hypothesis is true, the second order cumulant spectrum is zero at all the discrete Fourier frequency pairs in the principal domain. The test uses a window averaged sample estimate of the second order cumulant spectrum to build a test statistic with an asymptotic complex standard normal distribution. We derive the test statistic, study the properties of the test and demonstrate its application using 137Cs gamma ray decay data. Future areas of research include testing for strict stationarity of graph signals, with applications in learning convolutional neural networks on graphs, denoising, and inpainting.
COOct 23, 2017
An Expectation Maximization Framework for Yule-Simon Preferential Attachment ModelsLucas Roberts, Denisa Roberts
In this paper we develop an Expectation Maximization(EM) algorithm to estimate the parameter of a Yule-Simon distribution. The Yule-Simon distribution exhibits the "rich get richer" effect whereby an 80-20 type of rule tends to dominate. These distributions are ubiquitous in industrial settings. The EM algorithm presented provides both frequentist and Bayesian estimates of the $λ$ parameter. By placing the estimation method within the EM framework we are able to derive Standard errors of the resulting estimate. Additionally, we prove convergence of the Yule-Simon EM algorithm and study the rate of convergence. An explicit, closed form solution for the rate of convergence of the algorithm is given. Applications including graph node degree distribution estimation are listed.