Marcus Ma

CL
h-index22
6papers
13citations
Novelty44%
AI Score49

6 Papers

MMApr 21
Smiling Regulates Emotion During Traumatic Recollection

Marcus Ma, Emily Zhou, Leonard Ludwig et al.

We study when, where, and why 978 Holocaust survivors smile in video testimonies. We create an automatic smile detection model from facial features with an F1 of 85% and annotate detected smiles under two established taxonomies of smiling. We produce narrative features on 1,083,417 transcript sentences as well as emotional valence from three different modalities: audio, eye gaze, and text transcript. Smiling rates are significantly correlated with specific semantic topics, narrative structures, and temporal syntaxes across the entire corpus. Smiles often occur during periods of intense negative affect; these negative-affect smiles improve the valence trajectory of surrounding sentences significantly across all three modalities. Smiling reduces eye dynamics and blink rates, and the strength of both of these effects is also modulated by narrative valence. Taken together, we conclude that smiling plays a critical role in regulating emotion and social interaction during traumatic recollection.

MMApr 23
Looking Into the Past: Eye Movements Characterize Elements of Autobiographical Recall in Interviews with Holocaust Survivors

Emily Zhou, Marcus Ma, Kleanthis Avramidis et al.

Eye movement and memory retrieval are deeply and bidirectionally intertwined, however existing literature is generally confined to controlled lab settings. We investigate the relationship between eye gaze and memory recall in free-form autobiographical recall, which comprises both autonoetic consciousness -- the ability to mentally place oneself in the past or future -- and various affective states. Using a large video corpus of semi-naturalistic interviews with Holocaust survivors (N = 806), we examine eye movements with respect to episodic, semantic, affective, and temporal dimensions of traumatic and highly emotional autobiographical recall. We observe gaze patterns vary significantly across certain temporal contexts, most prominently in vertical eye movements. We additionally train intra-subject sequence models to predict temporal context of sentences from segments of gaze features, and find that eye movements entirely preceding sentence onset are sufficient for prediction. Our results corroborate prior findings in literature linking eye movements to memory in controlled and semi-structured settings, reinforcing the role of eye gaze in retrieving and constructing memories, especially in highly emotional and remote memory recall.

CLMay 22, 2025Code
Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts

Georgios Chochlakis, Peter Wu, Arjun Bedi et al.

Modeling complex subjective tasks in Natural Language Processing, such as recognizing emotion and morality, is considerably challenging due to significant variation in human annotations. This variation often reflects reasonable differences in semantic interpretations rather than mere noise, necessitating methods to distinguish between legitimate subjectivity and error. We address this challenge by exploring label verification in these contexts using Large Language Models (LLMs). First, we propose a simple In-Context Learning binary filtering baseline that estimates the reasonableness of a document-label pair. We then introduce the Label-in-a-Haystack setting: the query and its label(s) are included in the demonstrations shown to LLMs, which are prompted to predict the label(s) again, while receiving task-specific instructions (e.g., emotion recognition) rather than label copying. We show how the failure to copy the label(s) to the output of the LLM are task-relevant and informative. Building on this, we propose the Label-in-a-Haystack Rectification (LiaHR) framework for subjective label correction: when the model outputs diverge from the reference gold labels, we assign the generated labels to the example instead of discarding it. This approach can be integrated into annotation pipelines to enhance signal-to-noise ratios. Comprehensive analyses, human evaluations, and ecological validity studies verify the utility of LiaHR for label correction. Code is available at https://github.com/gchochla/liahr.

CLMay 23, 2025
Large Language Models Do Multi-Label Classification Differently

Marcus Ma, Georgios Chochlakis, Niyantha Maruthu Pandiyan et al.

Multi-label classification is prevalent in real-world settings, but the behavior of Large Language Models (LLMs) in this setting is understudied. We investigate how autoregressive LLMs perform multi-label classification, focusing on subjective tasks, by analyzing the output distributions of the models at each label generation step. We find that the initial probability distribution for the first label often does not reflect the eventual final output, even in terms of relative order and find LLMs tend to suppress all but one label at each generation step. We further observe that as model scale increases, their token distributions exhibit lower entropy and higher single-label confidence, but the internal relative ranking of the labels improves. Finetuning methods such as supervised finetuning and reinforcement learning amplify this phenomenon. We introduce the task of distribution alignment for multi-label settings: aligning LLM-derived label distributions with empirical distributions estimated from annotator responses in subjective tasks. We propose both zero-shot and supervised methods which improve both alignment and predictive performance over existing approaches. We find one method -- taking the max probability over all label generation distributions instead of just using the initial probability distribution -- improves both distribution alignment and overall F1 classification without adding any additional computation.

CLDec 15, 2025
Authors Should Label Their Own Documents

Marcus Ma, Cole Johnson, Nolan Bridges et al.

Third-party annotation is the status quo for labeling text, but egocentric information such as sentiment and belief can at best only be approximated by a third-person proxy. We introduce author labeling, an annotation technique where the writer of the document itself annotates the data at the moment of creation. We collaborate with a commercial chatbot with over 20,000 users to deploy an author labeling annotation system. This system identifies task-relevant queries, generates on-the-fly labeling questions, and records authors' answers in real time. We train and deploy an online-learning model architecture for product recommendation with author-labeled data to improve performance. We train our model to minimize the prediction error on questions generated for a set of predetermined subjective beliefs using author-labeled responses. Our model achieves a 537% improvement in click-through rate compared to an industry advertising baseline running concurrently. We then compare the quality and practicality of author labeling to three traditional annotation approaches for sentiment analysis and find author labeling to be higher quality, faster to acquire, and cheaper. These findings reinforce existing literature that annotations, especially for egocentric and subjective beliefs, are significantly higher quality when labeled by the author rather than a third party. To facilitate broader scientific adoption, we release an author labeling service for the research community at https://academic.echogroup.ai.

LGApr 29, 2025
Deep Learning Characterizes Depression and Suicidal Ideation from Eye Movements

Kleanthis Avramidis, Woojae Jeong, Aditya Kommineni et al.

Identifying physiological and behavioral markers for mental health conditions is a longstanding challenge in psychiatry. Depression and suicidal ideation, in particular, lack objective biomarkers, with screening and diagnosis primarily relying on self-reports and clinical interviews. Here, we investigate eye tracking as a potential marker modality for screening purposes. Eye movements are directly modulated by neuronal networks and have been associated with attentional and mood-related patterns; however, their predictive value for depression and suicidality remains unclear. We recorded eye-tracking sequences from 126 young adults as they read and responded to affective sentences, and subsequently developed a deep learning framework to predict their clinical status. The proposed model included separate branches for trials of positive and negative sentiment, and used 2D time-series representations to account for both intra-trial and inter-trial variations. We were able to identify depression and suicidal ideation with an area under the receiver operating curve (AUC) of 0.793 (95% CI: 0.765-0.819) against healthy controls, and suicidality specifically with 0.826 AUC (95% CI: 0.797-0.852). The model also exhibited moderate, yet significant, accuracy in differentiating depressed from suicidal participants, with 0.609 AUC (95% CI 0.571-0.646). Discriminative patterns emerge more strongly when assessing the data relative to response generation than relative to the onset time of the final word of the sentences. The most pronounced effects were observed for negative-sentiment sentences, that are congruent to depressed and suicidal participants. Our findings highlight eye tracking as an objective tool for mental health assessment and underscore the modulatory impact of emotional stimuli on cognitive processes affecting oculomotor control.