Manav Chaudhary

CL
h-index12
5papers
59citations
Novelty39%
AI Score26

5 Papers

CLJun 30, 2023
iMETRE: Incorporating Markers of Entity Types for Relation Extraction

N Harsha Vardhan, Manav Chaudhary

Sentence-level relation extraction (RE) aims to identify the relationship between 2 entities given a contextual sentence. While there have been many attempts to solve this problem, the current solutions have a lot of room to improve. In this paper, we approach the task of relationship extraction in the financial dataset REFinD. Our approach incorporates typed entity markers representations and various models finetuned on the dataset, which has allowed us to achieve an F1 score of 69.65% on the validation set. Through this paper, we discuss various approaches and possible limitations.

CLDec 12, 2024
Towards Understanding the Robustness of LLM-based Evaluations under Perturbations

Manav Chaudhary, Harshit Gupta, Savita Bhat et al.

Traditional evaluation metrics like BLEU and ROUGE fall short when capturing the nuanced qualities of generated text, particularly when there is no single ground truth. In this paper, we explore the potential of Large Language Models (LLMs), specifically Google Gemini 1, to serve as automatic evaluators for non-standardized metrics in summarization and dialog-based tasks. We conduct experiments across multiple prompting strategies to examine how LLMs fare as quality evaluators when compared with human judgments on the SummEval and USR datasets, asking the model to generate both a score as well as a justification for the score. Furthermore, we explore the robustness of the LLM evaluator by using perturbed inputs. Our findings suggest that while LLMs show promise, their alignment with human evaluators is limited, they are not robust against perturbations and significant improvements are required for their standalone use as reliable evaluators for subjective metrics.

SEApr 23, 2025
Can Code Language Models Learn Clarification-Seeking Behaviors?

Jie JW Wu, Manav Chaudhary, Davit Abrahamyan et al. · stanford

Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, a gap remains between their output and the problem-solving strategies of human developers. Unlike humans, who spend substantial time disambiguating requirements through iterative dialogue, LLMs often generate code despite ambiguities in natural language requirements, leading to unreliable solutions. Different from prior work, we study whether a Code LLM can be fine-tuned to learn clarification-seeking behavior. While recent work has focused on LLM-based agents for iterative code generation, we argue that the ability to recognize and query ambiguous requirements should be intrinsic to the models themselves, especially in agentic AI where models and humans collaborate. We present ClarifyCoder, a framework with synthetic data generation and instruction-tuning that fine-tunes an LLM to identify ambiguities and request clarification before code generation. Our approach has two components: (1) a data synthesis technique that augments programming datasets with scenarios requiring clarification to generate clarification-aware training data, and (2) a fine-tuning strategy that teaches models to prioritize seeking clarification over immediate code generation when faced with incomplete or ambiguous requirements. We also provide an empirical analysis of integrating ClarifyCoder with standard fine-tuning for joint optimization of clarification-awareness and coding ability. Experimental results show that ClarifyCoder achieves a 63% communication rate (40% absolute increase) and a 52% good question rate (30% absolute increase) on ambiguous tasks, significantly improving LLMs' communication capabilities while maintaining code generation performance.

CLMay 20, 2025
AutoRev: Multi-Modal Graph Retrieval for Automated Peer-Review Generation

Maitreya Prafulla Chitale, Ketaki Mangesh Shetye, Harshit Gupta et al.

Enhancing the quality and efficiency of academic publishing is critical for both authors and reviewers, as research papers are central to scholarly communication and a major source of high-quality content on the web. To support this goal, we propose AutoRev, an automatic peer-review system designed to provide actionable, high-quality feedback to both reviewers and authors. AutoRev leverages a novel Multi-Modal Retrieval-Augmented Generation (RAG) framework that combines textual and graphical representations of academic papers. By modelling documents as graphs, AutoRev effectively retrieves the most pertinent information, significantly reducing the input context length for LLMs and thereby enhancing their review generation capabilities. Experimental results show that AutoRev outperforms state-of-the-art baselines by up to 58.72% and demonstrates competitive performance in human evaluations against ground truth reviews. We envision AutoRev as a powerful tool to streamline the peer-review workflow, alleviating challenges and enabling scalable, high-quality scholarly publishing. By guiding both authors and reviewers, AutoRev has the potential to accelerate the dissemination of quality research on the web at a larger scale. Code will be released upon acceptance.

CLMay 18, 2024
BrainStorm @ iREL at #SMM4H 2024: Leveraging Translation and Topical Embeddings for Annotation Detection in Tweets

Manav Chaudhary, Harshit Gupta, Vasudeva Varma

The proliferation of LLMs in various NLP tasks has sparked debates regarding their reliability, particularly in annotation tasks where biases and hallucinations may arise. In this shared task, we address the challenge of distinguishing annotations made by LLMs from those made by human domain experts in the context of COVID-19 symptom detection from tweets in Latin American Spanish. This paper presents BrainStorm @ iRELs approach to the SMM4H 2024 Shared Task, leveraging the inherent topical information in tweets, we propose a novel approach to identify and classify annotations, aiming to enhance the trustworthiness of annotated data.