Ayush Raj

CL
h-index31
4papers
10citations
Novelty55%
AI Score42

4 Papers

CLMay 10
Quantifying the Utility of User Simulators for Building Collaborative LLM Assistants

Joseph Suh, Ayush Raj, Minwoo Kang et al.

User simulators are increasingly leveraged to build interactive AI assistants, yet how to measure the quality of these simulators remains an open question. In this work, we show how simulator quality can be quantified in terms of its downstream utility: how an LLM assistant trained with this user simulator performs in the wild when interacting with real humans. In a controlled experiment where only the user simulator varies, we train LLM assistants via reinforcement learning against a spectrum of simulators, from an LLM prompted to role-play a user to one fine-tuned on human utterances from WildChat. As evaluation, we measure pairwise win rates in a user study with 283 participants and on WildBench, a benchmark derived from real human--AI conversations. Training against the role-playing LLM yields an assistant statistically indistinguishable from the initial assistant in our user study (51% win rate), whereas training against the fine-tuned simulator yields significant gains (58% over the initial and 57% over the one trained against role-playing). Closer inspection reveals three further patterns: methods for making role-playing LLMs more realistic (e.g., persona conditioning) improve trained assistants but do not close the gap to the fine-tuned simulator; scaling the simulator's model size benefits the fine-tuned simulator but yields no gain for role-playing ones; and assistants trained against role-playing simulators fail to generalize when paired with other simulators at test time, while the one trained against fine-tuned simulator does. Together, these results argue for grounding user simulators in real human behavior and measuring their quality by their downstream effect on real users.

CLApr 16, 2025
Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions

Minwoo Kang, Suhong Moon, Seung Hyeong Lee et al.

Large language models (LLMs) are increasingly capable of simulating human behavior, offering cost-effective ways to estimate user responses to various surveys and polls. However, the questions in these surveys usually reflect socially understood attitudes: the patterns of attitudes of old/young, liberal/conservative, as understood by both members and non-members of those groups. It is not clear whether the LLM binding is \emph{deep}, meaning the LLM answers as a member of a particular in-group would, or \emph{shallow}, meaning the LLM responds as an out-group member believes an in-group member would. To explore this difference, we use questions that expose known in-group/out-group biases. This level of fidelity is critical for applying LLMs to various political science studies, including timely topics on polarization dynamics, inter-group conflict, and democratic backsliding. To this end, we propose a novel methodology for constructing virtual personas with synthetic user "backstories" generated as extended, multi-turn interview transcripts. This approach is justified by the theory of \emph{narrative identity} which argues that personality at the highest level is \emph{constructed} from self-narratives. Our generated backstories are longer, rich in detail, and consistent in authentically describing a singular individual, compared to previous methods. We show that virtual personas conditioned on our backstories closely replicate human response distributions (up to an 87% improvement as measured by Wasserstein Distance) and produce effect sizes that closely match those observed in the original studies of in-group/out-group biases. Altogether, our work extends the applicability of LLMs beyond estimating socially understood responses, enabling their use in a broader range of human studies.

CVNov 8, 2024
Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity Recognition

Abhisek Ray, Ayush Raj, Maheshkumar H. Kolekar

Extracting multiscale contextual information and higher-order correlations among skeleton sequences using Graph Convolutional Networks (GCNs) alone is inadequate for effective action classification. Hypergraph convolution addresses the above issues but cannot harness the long-range dependencies. The transformer proves to be effective in capturing these dependencies and making complex contextual features accessible. We propose an Autoregressive Adaptive HyperGraph Transformer (AutoregAd-HGformer) model for in-phase (autoregressive and discrete) and out-phase (adaptive) hypergraph generation. The vector quantized in-phase hypergraph equipped with powerful autoregressive learned priors produces a more robust and informative representation suitable for hyperedge formation. The out-phase hypergraph generator provides a model-agnostic hyperedge learning technique to align the attributes with input skeleton embedding. The hybrid (supervised and unsupervised) learning in AutoregAd-HGformer explores the action-dependent feature along spatial, temporal, and channel dimensions. The extensive experimental results and ablation study indicate the superiority of our model over state-of-the-art hypergraph architectures on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.

CVJan 4, 2022
HWRCNet: Handwritten Word Recognition in JPEG Compressed Domain using CNN-BiLSTM Network

Bulla Rajesh, Abhishek Kumar Gupta, Ayush Raj et al.

Handwritten word recognition from document images using deep learning is an active research area in the field of Document Image Analysis and Recognition. In the present era of Big data, since more and more documents are being generated and archived in the compressed form to provide better storage and transmission efficiencies, the problem of word recognition in the respective compressed domain without decompression becomes very challenging. The traditional methods employ decompression and then apply learning algorithms over them, therefore, novel algorithms are to be designed in order to apply learning techniques directly in the compressed representations/domains. In this direction, this research paper proposes a novel HWRCNet model for handwritten word recognition directly in the compressed domain specifically focusing on JPEG format. The proposed model combines the Convolutional Neural Network (CNN) and Bi-Directional Long Short Term Memory (BiLSTM) based Recurrent Neural Network (RNN). Basically, we train the model using JPEG compressed word images and observe a very appealing performance with $89.05\%$ word recognition accuracy and $13.37\%$ character error rate.