Anmol Singhal

SE
h-index15
5papers
89citations
Novelty37%
AI Score42

5 Papers

57.8SEJun 1
Report on the Designing Accountable Software Systems Workshop

Catherine Albiston, Travis Breaux, Kat Dearstyne et al.

The Workshop on Designing Accountable Software Systems (DASS) was convened in November 2024 with support from the U.S. National Science Foundation to engage a wide range of current and future stakeholders from government, academia, and industry on the cross-disciplinary topic of accountability in software systems. Over two days, attendees engaged in a series of panels, invited talks, and breakout sessions covering: (1) the dimensions of accountability, including legal compliance as well as business and societal aspects and drivers; (2) a conceptual model of the various structures needed to realize accountability; (3) the sources of legal requirements that affect software; (4) the operationalization of legal requirements in software; (5) the requirements to preserve evidence needed to conduct investigations; and (6) a range of challenges and contextual factors beyond software that affect why some accountability structures succeed, while others fail. The workshop was conducted as a collaborative systematization of knowledge that culminated in several research directions. The findings include the importance of clarifying definitions and responsibilities within accountable organizations, which can affect whether those researching accountability are making assumptions that limit the generalizability of findings. Further research was also identified as needed to study the ways to improve the translation of accountability structures into the software design process while improving engagement with stakeholders, such as legislators, regulators, business executives and system developers. Finally, a key finding was the high demands that DASS-like research projects place on interdisciplinary teams: both in terms of team formation and sustainment, as well as, the specific demands of cross-disciplinary learning that covers both research methods, research dissemination, and career development.

SEFeb 26, 2024
Dealing with Data for RE: Mitigating Challenges while using NLP and Generative AI

Smita Ghaisas, Anmol Singhal

Across the dynamic business landscape today, enterprises face an ever-increasing range of challenges. These include the constantly evolving regulatory environment, the growing demand for personalization within software applications, and the heightened emphasis on governance. In response to these multifaceted demands, large enterprises have been adopting automation that spans from the optimization of core business processes to the enhancement of customer experiences. Indeed, Artificial Intelligence (AI) has emerged as a pivotal element of modern software systems. In this context, data plays an indispensable role. AI-centric software systems based on supervised learning and operating at an industrial scale require large volumes of training data to perform effectively. Moreover, the incorporation of generative AI has led to a growing demand for adequate evaluation benchmarks. Our experience in this field has revealed that the requirement for large datasets for training and evaluation introduces a host of intricate challenges. This book chapter explores the evolving landscape of Software Engineering (SE) in general, and Requirements Engineering (RE) in particular, in this era marked by AI integration. We discuss challenges that arise while integrating Natural Language Processing (NLP) and generative AI into enterprise-critical software systems. The chapter provides practical insights, solutions, and examples to equip readers with the knowledge and tools necessary for effectively building solutions with NLP at their cores. We also reflect on how these text data-centric tasks sit together with the traditional RE process. We also highlight new RE tasks that may be necessary for handling the increasingly important text data-centricity involved in developing software systems.

SEJul 3, 2025
Legal Requirements Translation from Law

Anmol Singhal, Travis Breaux

Software systems must comply with legal regulations, which is a resource-intensive task, particularly for small organizations and startups lacking dedicated legal expertise. Extracting metadata from regulations to elicit legal requirements for software is a critical step to ensure compliance. However, it is a cumbersome task due to the length and complex nature of legal text. Although prior work has pursued automated methods for extracting structural and semantic metadata from legal text, key limitations remain: they do not consider the interplay and interrelationships among attributes associated with these metadata types, and they rely on manual labeling or heuristic-driven machine learning, which does not generalize well to new documents. In this paper, we introduce an approach based on textual entailment and in-context learning for automatically generating a canonical representation of legal text, encodable and executable as Python code. Our representation is instantiated from a manually designed Python class structure that serves as a domain-specific metamodel, capturing both structural and semantic legal metadata and their interrelationships. This design choice reduces the need for large, manually labeled datasets and enhances applicability to unseen legislation. We evaluate our approach on 13 U.S. state data breach notification laws, demonstrating that our generated representations pass approximately 89.4% of test cases and achieve a precision and recall of 82.2 and 88.7, respectively.

SEJul 3, 2025
Requirements Elicitation Follow-Up Question Generation

Yuchen Shen, Anmol Singhal, Travis Breaux

Interviews are a widely used technique in eliciting requirements to gather stakeholder needs, preferences, and expectations for a software system. Effective interviewing requires skilled interviewers to formulate appropriate interview questions in real time while facing multiple challenges, including lack of familiarity with the domain, excessive cognitive load, and information overload that hinders how humans process stakeholders' speech. Recently, large language models (LLMs) have exhibited state-of-the-art performance in multiple natural language processing tasks, including text summarization and entailment. To support interviewers, we investigate the application of GPT-4o to generate follow-up interview questions during requirements elicitation by building on a framework of common interviewer mistake types. In addition, we describe methods to generate questions based on interviewee speech. We report a controlled experiment to evaluate LLM-generated and human-authored questions with minimal guidance, and a second controlled experiment to evaluate the LLM-generated questions when generation is guided by interviewer mistake types. Our findings demonstrate that, for both experiments, the LLM-generated questions are no worse than the human-authored questions with respect to clarity, relevancy, and informativeness. In addition, LLM-generated questions outperform human-authored questions when guided by common mistakes types. This highlights the potential of using LLMs to help interviewers improve the quality and ease of requirements elicitation interviews in real time.

CLMar 12, 2024
Generating Clarification Questions for Disambiguating Contracts

Anmol Singhal, Chirag Jain, Preethu Rose Anish et al.

Enterprises frequently enter into commercial contracts that can serve as vital sources of project-specific requirements. Contractual clauses are obligatory, and the requirements derived from contracts can detail the downstream implementation activities that non-legal stakeholders, including requirement analysts, engineers, and delivery personnel, need to conduct. However, comprehending contracts is cognitively demanding and error-prone for such stakeholders due to the extensive use of Legalese and the inherent complexity of contract language. Furthermore, contracts often contain ambiguously worded clauses to ensure comprehensive coverage. In contrast, non-legal stakeholders require a detailed and unambiguous comprehension of contractual clauses to craft actionable requirements. In this work, we introduce a novel legal NLP task that involves generating clarification questions for contracts. These questions aim to identify contract ambiguities on a document level, thereby assisting non-legal stakeholders in obtaining the necessary details for eliciting requirements. This task is challenged by three core issues: (1) data availability, (2) the length and unstructured nature of contracts, and (3) the complexity of legal text. To address these issues, we propose ConRAP, a retrieval-augmented prompting framework for generating clarification questions to disambiguate contractual text. Experiments conducted on contracts sourced from the publicly available CUAD dataset show that ConRAP with ChatGPT can detect ambiguities with an F2 score of 0.87. 70% of the generated clarification questions are deemed useful by human evaluators.