Michael Guerzhoy

CL
h-index3
17papers
205citations
Novelty27%
AI Score29

17 Papers

CLJul 28, 2024
Occam's Razor and Bender and Koller's Octopus

Michael Guerzhoy

We discuss the teaching of the discussion surrounding Bender and Koller's prominent ACL 2020 paper, "Climbing toward NLU: on meaning form, and understanding in the age of data" \cite{bender2020climbing}. We present what we understand to be the main contentions of the paper, and then recommend that the students engage with the natural counter-arguments to the claims in the paper. We attach teaching materials that we use to facilitate teaching this topic to undergraduate students.

CVJun 1, 2023
How Do ConvNets Understand Image Intensity?

Jackson Kaunismaa, Michael Guerzhoy

Convolutional Neural Networks (ConvNets) usually rely on edge/shape information to classify images. Visualization methods developed over the last decade confirm that ConvNets rely on edge information. We investigate situations where the ConvNet needs to rely on image intensity in addition to shape. We show that the ConvNet relies on image intensity information using visualization.

CLDec 30, 2024
Position Information Emerges in Causal Transformers Without Positional Encodings via Similarity of Nearby Embeddings

Chunsheng Zuo, Pavel Guerzhoy, Michael Guerzhoy

Transformers with causal attention can solve tasks that require positional information without using positional encodings. In this work, we propose and investigate a new hypothesis about how positional information can be stored without using explicit positional encoding. We observe that nearby embeddings are more similar to each other than faraway embeddings, allowing the transformer to potentially reconstruct the positions of tokens. We show that this pattern can occur in both the trained and the randomly initialized Transformer models with causal attention and no positional encodings over a common range of hyperparameters.

CYFeb 17, 2024
Detecting a Proxy for Potential Comorbid ADHD in People Reporting Anxiety Symptoms from Social Media Data

Claire S. Lee, Noelle Lim, Michael Guerzhoy

We present a novel task that can elucidate the connection between anxiety and ADHD; use Transformers to make progress toward solving a task that is not solvable by keyword-based classifiers; and discuss a method for visualization of our classifier illuminating the connection between anxiety and ADHD presentations. Up to approximately 50% of adults with ADHD may also have an anxiety disorder and approximately 30\% of adults with anxiety may also have ADHD. Patients presenting with anxiety may be treated for anxiety without ADHD ever being considered, possibly affecting treatment. We show how data that bears on ADHD that is comorbid with anxiety can be obtained from social media data, and show that Transformers can be used to detect a proxy for possible comorbid ADHD in people with anxiety symptoms. We collected data from anxiety and ADHD online forums (subreddits). We identified posters who first started posting in the Anxiety subreddit and later started posting in the ADHD subreddit as well. We use this subset of the posters as a proxy for people who presented with anxiety symptoms and then became aware that they might have ADHD. We fine-tune a Transformer architecture-based classifier to classify people who started posting in the Anxiety subreddit and then started posting in the ADHD subreddit vs. people who posted in the Anxiety subreddit without later posting in the ADHD subreddit. We show that a Transformer architecture is capable of achieving reasonable results (76% correct for RoBERTa vs. under 60% correct for the best keyword-based model, both with 50% base rate).

CLJun 4, 2025
Automatically Detecting Amusing Games in Wordle

Ronaldo Luo, Gary Liang, Cindy Liu et al.

We explore automatically predicting which Wordle games Reddit users find amusing. We scrape approximately 80k reactions by Reddit users to Wordle games from Reddit, classify the reactions as expressing amusement or not using OpenAI's GPT-3.5 using few-shot prompting, and verify that GPT-3.5's labels roughly correspond to human labels. We then extract features from Wordle games that can predict user amusement. We demonstrate that the features indeed provide a (weak) signal that predicts user amusement as predicted by GPT-3.5. Our results indicate that user amusement at Wordle games can be predicted computationally to some extent. We explore which features of the game contribute to user amusement. We find that user amusement is predictable, indicating a measurable aspect of creativity infused into Wordle games through humor.

LGFeb 16, 2024
Toward Learning Latent-Variable Representations of Microstructures by Optimizing in Spatial Statistics Space

Sayed Sajad Hashemi, Michael Guerzhoy, Noah H. Paulson

In Materials Science, material development involves evaluating and optimizing the internal structures of the material, generically referred to as microstructures. Microstructures structure is stochastic, analogously to image textures. A particular microstructure can be well characterized by its spatial statistics, analogously to image texture being characterized by the response to a Fourier-like filter bank. Material design would benefit from low-dimensional representation of microstructures Paulson et al. (2017). In this work, we train a Variational Autoencoders (VAE) to produce reconstructions of textures that preserve the spatial statistics of the original texture, while not necessarily reconstructing the same image in data space. We accomplish this by adding a differentiable term to the cost function in order to minimize the distance between the original and the reconstruction in spatial statistics space. Our experiments indicate that it is possible to train a VAE that minimizes the distance in spatial statistics space between the original and the reconstruction of synthetic images. In future work, we will apply the same techniques to microstructures, with the goal of obtaining low-dimensional representations of material microstructures.

CLDec 9, 2024
Exploring Complex Mental Health Symptoms via Classifying Social Media Data with Explainable LLMs

Kexin Chen, Noelle Lim, Claire Lee et al.

We propose a pipeline for gaining insights into complex diseases by training LLMs on challenging social media text data classification tasks, obtaining explanations for the classification outputs, and performing qualitative and quantitative analysis on the explanations. We report initial results on predicting, explaining, and systematizing the explanations of predicted reports on mental health concerns in people reporting Lyme disease concerns. We report initial results on predicting future ADHD concerns for people reporting anxiety disorder concerns, and demonstrate preliminary results on visualizing the explanations for predicting that a person with anxiety concerns will in the future have ADHD concerns.

SIOct 11, 2024
Observing the Southern US Culture of Honor Using Large-Scale Social Media Analysis

Juho Kim, Michael Guerzhoy

A \textit{culture of honor} refers to a social system where individuals' status, reputation, and esteem play a central role in governing interpersonal relations. Past works have associated this concept with the United States (US) South and related with it various traits such as higher sensitivity to insult, a higher value on reputation, and a tendency to react violently to insults. In this paper, we hypothesize and confirm that internet users from the US South, where a \textit{culture of honor} is more prevalent, are more likely to display a trait predicted by their belonging to a \textit{culture of honor}. Specifically, we test the hypothesis that US Southerners are more likely to retaliate to personal attacks by personally attacking back. We leverage OpenAI's GPT-3.5 API to both geolocate internet users and to automatically detect whether users are insulting each other. We validate the use of GPT-3.5 by measuring its performance on manually-labeled subsets of the data. Our work demonstrates the potential of formulating a hypothesis based on a conceptual framework, operationalizing it in a way that is amenable to large-scale LLM-aided analysis, manually validating the use of the LLM, and drawing a conclusion.

CLNov 14, 2024
Semantic, Orthographic, and Phonological Biases in Humans' Wordle Gameplay

Jiadong Liang, Adam Kabbara, Jiaying Liu et al.

We show that human players' gameplay in the game of Wordle is influenced by the semantics, orthography, and phonology of the player's previous guesses. We compare actual human players' guesses with near-optimal guesses using NLP techniques. We study human language use in the constrained environment of Wordle, which is situated between natural language use and the artificial word association task

AINov 10, 2024
Barriers to Complexity-Theoretic Proofs that Achieving AGI Using Machine Learning is Intractable

Michael Guerzhoy

A recent paper (van Rooij et al. 2024) claims to have proved that achieving human-like intelligence using learning from data is intractable in a complexity-theoretic sense. We identify that the proof relies on an unjustified assumption about the distribution of (input, output) pairs to the system. We briefly discuss that assumption in the context of two fundamental barriers to repairing the proof: the need to precisely define ``human-like," and the need to account for the fact that a particular machine learning system will have particular inductive biases that are key to the analysis.

CVJun 21, 2024
Effect of Rotation Angle in Self-Supervised Pre-training is Dataset-Dependent

Amy Saranchuk, Michael Guerzhoy

Self-supervised learning for pre-training (SSP) can help the network learn better low-level features, especially when the size of the training set is small. In contrastive pre-training, the network is pre-trained to distinguish between different versions of the input. For example, the network learns to distinguish pairs (original, rotated) of images where the rotated image was rotated by angle $θ$ vs. other pairs of images. In this work, we show that, when training using contrastive pre-training in this way, the angle $θ$ and the dataset interact in interesting ways. We hypothesize, and give some evidence, that, for some datasets, the network can take "shortcuts" for particular rotation angles $θ$ based on the distribution of the gradient directions in the input, possibly avoiding learning features other than edges, but our experiments do not seem to support that hypothesis. We demonstrate experiments on three radiology datasets. We compute the saliency map indicating which pixels were important in the SSP process, and compare the saliency map to the ground truth foreground/background segmentation. Our visualizations indicate that the effects of rotation angles in SSP are dataset-dependent. We believe the distribution of gradient orientations may play a role in this, but our experiments so far are inconclusive.

AIJun 14, 2024
Predicting User Perception of Move Brilliance in Chess

Kamron Zaidi, Michael Guerzhoy

AI research in chess has been primarily focused on producing stronger agents that can maximize the probability of winning. However, there is another aspect to chess that has largely gone unexamined: its aesthetic appeal. Specifically, there exists a category of chess moves called ``brilliant" moves. These moves are appreciated and admired by players for their high intellectual aesthetics. We demonstrate the first system for classifying chess moves as brilliant. The system uses a neural network, using the output of a chess engine as well as features that describe the shape of the game tree. The system achieves an accuracy of 79% (with 50% base-rate), a PPV of 83%, and an NPV of 75%. We demonstrate that what humans perceive as ``brilliant" moves is not merely the best possible move. We show that a move is more likely to be predicted as brilliant, all things being equal, if a weaker engine considers it lower-quality (for the same rating by a stronger engine). Our system opens the avenues for computer chess engines to (appear to) display human-like brilliance, and, hence, creativity.

LGFeb 6, 2024
Breaking Symmetry When Training Transformers

Chunsheng Zuo, Michael Guerzhoy

As we show in this paper, the prediction for output token $n+1$ of Transformer architectures without one of the mechanisms of positional encodings and causal attention is invariant to permutations of input tokens $1, 2, ..., n-1$. Usually, both mechanisms are employed and the symmetry with respect to the input tokens is broken. Recently, it has been shown that one can train Transformers without positional encodings. This must be enabled by the causal attention mechanism. In this paper, we elaborate on the argument that the causal connection mechanism must be responsible for the fact that Transformers are able to model input sequences where the order is important. Vertical "slices" of Transformers are all encouraged to represent the same location $k$ in the input sequence. We hypothesize that residual connections contribute to this phenomenon, and demonstrate evidence for this.

CLDec 12, 2023
Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency

Pavlos Constas, Vikram Rawal, Matthew Honorio Oliveira et al.

We propose a reinforcement learning (RL)-based system that would automatically prescribe a hypothetical patient medication that may help the patient with their mental health-related speech disfluency, and adjust the medication and the dosages in response to zero-cost frequent measurement of the fluency of the patient. We demonstrate the components of the system: a module that detects and evaluates speech disfluency on a large dataset we built, and an RL algorithm that automatically finds good combinations of medications. To support the two modules, we collect data on the effect of psychiatric medications for speech disfluency from the literature, and build a plausible patient simulation system. We demonstrate that the RL system is, under some circumstances, able to converge to a good medication regime. We collect and label a dataset of people with possible speech disfluency and demonstrate our methods using that dataset. Our work is a proof of concept: we show that there is promise in the idea of using automatic data collection to address speech disfluency.

CVMay 17, 2023
Automatic Photo Orientation Detection with Convolutional Neural Networks

Ujash Joshi, Michael Guerzhoy

We apply convolutional neural networks (CNN) to the problem of image orientation detection in the context of determining the correct orientation (from 0, 90, 180, and 270 degrees) of a consumer photo. The problem is especially important for digitazing analog photographs. We substantially improve on the published state of the art in terms of the performance on one of the standard datasets, and test our system on a more difficult large dataset of consumer photos. We use Guided Backpropagation to obtain insights into how our CNN detects photo orientation, and to explain its mistakes.

CLMay 17, 2023
Boosting Local Spectro-Temporal Features for Speech Analysis

Michael Guerzhoy

We introduce the problem of phone classification in the context of speech recognition, and explore several sets of local spectro-temporal features that can be used for phone classification. In particular, we present some preliminary results for phone classification using two sets of features that are commonly used for object detection: Haar features and SVM-classified Histograms of Gradients (HoG).

CVMar 8, 2020
Salient Facial Features from Humans and Deep Neural Networks

Shanmeng Sun, Wei Zhen Teoh, Michael Guerzhoy

In this work, we explore the features that are used by humans and by convolutional neural networks (ConvNets) to classify faces. We use Guided Backpropagation (GB) to visualize the facial features that influence the output of a ConvNet the most when identifying specific individuals; we explore how to best use GB for that purpose. We use a human intelligence task to find out which facial features humans find to be the most important for identifying specific individuals. We explore the differences between the saliency information gathered from humans and from ConvNets. Humans develop biases in employing available information on facial features to discriminate across faces. Studies show these biases are influenced both by neurological development and by each individual's social experience. In recent years the computer vision community has achieved human-level performance in many face processing tasks with deep neural network-based models. These face processing systems are also subject to systematic biases due to model architectural choices and training data distribution.