Utkarsh Desai

SE
h-index6
5papers
89citations
Novelty48%
AI Score37

5 Papers

CLDec 11, 2025
Script Gap: Evaluating LLM Triage on Indian Languages in Native vs Roman Scripts in a Real World Setting

Manurag Khullar, Utkarsh Desai, Poorva Malviya et al.

Large Language Models (LLMs) are increasingly deployed in high-stakes clinical applications in India. In many such settings, speakers of Indian languages frequently communicate using romanized text rather than native scripts, yet existing research rarely evaluates this orthographic variation using real-world data. We investigate how romanization impacts the reliability of LLMs in a critical domain: maternal and newborn healthcare triage. We benchmark leading LLMs on a real-world dataset of user-generated queries spanning five Indian languages and Nepali. Our results reveal consistent degradation in performance for romanized messages, with F1 scores trailing those of native scripts by 5-12 points. At our partner maternal health organization in India, this gap could cause nearly 2 million excess errors in triage. Crucially, this performance gap by scripts is not due to a failure in clinical reasoning. We demonstrate that LLMs often correctly infer the semantic intent of romanized queries. Nevertheless, their final classification outputs remain brittle in the presence of orthographic noise in romanized inputs. Our findings highlight a critical safety blind spot in LLM-based health systems: models that appear to understand romanized input may still fail to act on it reliably.

SEDec 1, 2021
Monolith to Microservices: Representing Application Software through Heterogeneous Graph Neural Network

Alex Mathai, Sambaran Bandyopadhyay, Utkarsh Desai et al.

Monolithic software encapsulates all functional capabilities into a single deployable unit. But managing it becomes harder as the demand for new functionalities grow. Microservice architecture is seen as an alternate as it advocates building an application through a set of loosely coupled small services wherein each service owns a single functional responsibility. But the challenges associated with the separation of functional modules, slows down the migration of a monolithic code into microservices. In this work, we propose a representation learning based solution to tackle this problem. We use a heterogeneous graph to jointly represent software artifacts (like programs and resources) and the different relationships they share (function calls, inheritance, etc.), and perform a constraint-based clustering through a novel heterogeneous graph neural network. Experimental studies show that our approach is effective on monoliths of different types.

SEFeb 7, 2021
Graph Neural Network to Dilute Outliers for Refactoring Monolith Application

Utkarsh Desai, Sambaran Bandyopadhyay, Srikanth Tamilselvam

Microservices are becoming the defacto design choice for software architecture. It involves partitioning the software components into finer modules such that the development can happen independently. It also provides natural benefits when deployed on the cloud since resources can be allocated dynamically to necessary components based on demand. Therefore, enterprises as part of their journey to cloud, are increasingly looking to refactor their monolith application into one or more candidate microservices; wherein each service contains a group of software entities (e.g., classes) that are responsible for a common functionality. Graphs are a natural choice to represent a software system. Each software entity can be represented as nodes and its dependencies with other entities as links. Therefore, this problem of refactoring can be viewed as a graph based clustering task. In this work, we propose a novel method to adapt the recent advancements in graph neural networks in the context of code to better understand the software and apply them in the clustering task. In that process, we also identify the outliers in the graph which can be directly mapped to top refactor candidates in the software. Our solution is able to improve state-of-the-art performance compared to works from both software engineering and existing graph representation based techniques.

SEOct 12, 2020
Evaluation of Siamese Networks for Semantic Code Search

Raunak Sinha, Utkarsh Desai, Srikanth Tamilselvam et al.

With the increase in the number of open repositories and discussion forums, the use of natural language for semantic code search has become increasingly common. The accuracy of the results returned by such systems, however, can be low due to 1) limited shared vocabulary between code and user query and 2) inadequate semantic understanding of user query and its relation to code syntax. Siamese networks are well suited to learning such joint relations between data, but have not been explored in the context of code search. In this work, we evaluate Siamese networks for this task by exploring multiple extraction network architectures. These networks independently process code and text descriptions before passing them to a Siamese network to learn embeddings in a common space. We experiment on two different datasets and discover that Siamese networks can act as strong regularizers on networks that extract rich information from code and text, which in turn helps achieve impressive performance on code search beating previous baselines on $2$ programming languages. We also analyze the embedding space of these networks and provide directions to fully leverage the power of Siamese networks for semantic code search.

CLJan 31, 2020
Benchmarking Popular Classification Models' Robustness to Random and Targeted Corruptions

Utkarsh Desai, Srikanth Tamilselvam, Jassimran Kaur et al.

Text classification models, especially neural networks based models, have reached very high accuracy on many popular benchmark datasets. Yet, such models when deployed in real world applications, tend to perform badly. The primary reason is that these models are not tested against sufficient real world natural data. Based on the application users, the vocabulary and the style of the model's input may greatly vary. This emphasizes the need for a model agnostic test dataset, which consists of various corruptions that are natural to appear in the wild. Models trained and tested on such benchmark datasets, will be more robust against real world data. However, such data sets are not easily available. In this work, we address this problem, by extending the benchmark datasets along naturally occurring corruptions such as Spelling Errors, Text Noise and Synonyms and making them publicly available. Through extensive experiments, we compare random and targeted corruption strategies using Local Interpretable Model-Agnostic Explanations(LIME). We report the vulnerabilities in two popular text classification models along these corruptions and also find that targeted corruptions can expose vulnerabilities of a model better than random choices in most cases.