Amarnath Gupta

DB
h-index3
11papers
2citations
Novelty35%
AI Score46

11 Papers

DBMay 18
Feasible Plan Generation with Ambiguity-Boundedness in Cross-Model Query Processing

Subhasis Dasgupta, Amarnath Gupta

Natural language (NL) interfaces to databases broaden access to heterogeneous data but often yield many ambiguous intermediate logical plans (ILPs) due to uncertain operator scope and predicate semantics. Many candidates are infeasible because of type mismatches, missing bindings, or engine-specific constraints. We address this challenge with \emph{feasibility constraints} for detecting local inconsistencies and introduce the Packed Plan Forest (PPF) a polynomially bounded structure that compactly encodes all feasible ILPs while pruning infeasible ones early. Extending packed parse forest ideas to multi-model settings, PPF supports efficient feasibility analysis through annotated operators. Formal results show polynomial size under bounded arity and annotation vocabularies, and experiments confirm that PPFs capture exponentially many ILPs with minimal overhead, establishing a scalable foundation for NL-to-DB query planning across heterogeneous systems

DBMar 14
MICRO: A Lightweight Middleware for Optimizing Cross-store Cross-model Graph-Relation Joins [Technical Report]

Xiuwen Zheng, Arun Kumar, Amarnath Gupta

Modern data applications increasingly involve heterogeneous data managed in different models and stored across disparate database engines, often deployed as separate installs. Limited research has addressed cross-model query processing in federated environments. This paper takes a step toward bridging this gap by: (1) formally defining a class of cross-model join queries between a graph store and a relational store by proposing a unified algebra; (2) introducing one real-world benchmark and four semi-synthetic benchmarks to evaluate such queries; and (3) proposing a lightweight middleware, MICRO, for efficient query execution. At the core of MICRO is CMLero, a learning-to-rank-based query optimizer that selects efficient execution plans without requiring exact cost estimation. By avoiding the need to materialize or convert all data into a single model, which is often infeasible due to third-party data control or cost, MICRO enables native querying across heterogeneous systems. Experimental results on the benchmark workloads demonstrate that MICRO outperforms the state-of-the-art federated relational system XDB by up to 2.1x in total runtime across the full test set. On the 93 test queries of real-world benchmark, 14 queries achieve over 100 speedup, including 4 queries with more than 100x speedup; however, 4 queries experienced slowdowns of over 5 seconds, highlighting opportunities for future improvement of MICRO. Further comparisons show that CMLero consistently outperforms rule-based and regression-based optimizers, highlighting the advantage of learning-to-rank in complex cross-model optimization.

DBApr 12
Natural Language to What? A Vision for Intermediate Representations in NL-to-X Querying

Shengqi Li, Amarnath Gupta

Natural-language-initiated querying is usually framed as translation into a predetermined backend language such as SQL, Cypher, or SPARQL. That framing is appropriate when the semantic target is known in advance, but it does not cover the full space of natural-language query workloads. In document-centric, mixed, and heterogeneous environments, the first semantic problem may be to determine what target should be constructed before backend-specific execution can begin. This paper proposes the $\textit{NLIQ}~$ lens for this broader space. It introduces target adequacy as the criterion for distinguishing settings in which the target is given, only partially specified, or must itself be constructed, and argues that intermediate representations in the latter regimes are not merely implementation devices but first-class semantic objects. The paper develops a compact framework of $\textit{NLIQ}~$ regimes, illustrates the distinction through representative examples, and identifies a new research terrain around semantic target formation, intermediate representation design, heterogeneous compilation, and answer formation in complex data environments.

CLAug 4, 2025
Can LLMs Generate High-Quality Task-Specific Conversations?

Shengqi Li, Amarnath Gupta

This paper introduces a parameterization framework for controlling conversation quality in large language models. We explore nine key parameters across six dimensions that enable precise specification of dialogue properties. Through experiments with state-of-the-art LLMs, we demonstrate that parameter-based control produces statistically significant differences in generated conversation properties. Our approach addresses challenges in conversation generation, including topic coherence, knowledge progression, character consistency, and control granularity. The framework provides a standardized method for conversation quality control with applications in education, therapy, customer service, and entertainment. Future work will focus on implementing additional parameters through architectural modifications and developing benchmark datasets for evaluation.

AIJul 7, 2025
OLG++: A Semantic Extension of Obligation Logic Graph

Subhasis Dasgupta, Jon Stephens, Amarnath Gupta

We present OLG++, a semantic extension of the Obligation Logic Graph (OLG) for modeling regulatory and legal rules in municipal and interjurisdictional contexts. OLG++ introduces richer node and edge types, including spatial, temporal, party group, defeasibility, and logical grouping constructs, enabling nuanced representations of legal obligations, exceptions, and hierarchies. The model supports structured reasoning over rules with contextual conditions, precedence, and complex triggers. We demonstrate its expressiveness through examples from food business regulations, showing how OLG++ supports legal question answering using property graph queries. OLG++ also improves over LegalRuleML by providing native support for subClassOf, spatial constraints, and reified exception structures. Our examples show that OLG++ is more expressive than prior graph-based models for legal knowledge representation.

AIJan 24, 2025
MISCON: A Mission-Driven Conversational Consultant for Pre-Venture Entrepreneurs in Food Deserts

Subhasis Dasgupta, Hans Taparia, Laura Schmidt et al.

This work-in-progress report describes MISCON, a conversational consultant being developed for a public mission project called NOURISH. With MISCON, aspiring small business owners in a food-insecure region and their advisors in Community-based organizations would be able to get information, recommendation and analysis regarding setting up food businesses. MISCON conversations are modeled as state machine that uses a heterogeneous knowledge graph as well as several analytical tools and services including a variety of LLMs. In this short report, we present the functional architecture and some design considerations behind MISCON.

CLJan 16, 2022
Temporal Relation Extraction with a Graph-Based Deep Biaffine Attention Model

Bo-Ying Su, Shang-Ling Hsu, Kuan-Yin Lai et al.

Temporal information extraction plays a critical role in natural language understanding. Previous systems have incorporated advanced neural language models and have successfully enhanced the accuracy of temporal information extraction tasks. However, these systems have two major shortcomings. First, they fail to make use of the two-sided nature of temporal relations in prediction. Second, they involve non-parallelizable pipelines in inference process that bring little performance gain. To this end, we propose a novel temporal information extraction model based on deep biaffine attention to extract temporal relationships between events in unstructured text efficiently and accurately. Our model is performant because we perform relation extraction tasks directly instead of considering event annotation as a prerequisite of relation extraction. Moreover, our architecture uses Multilayer Perceptrons (MLP) with biaffine attention to predict arcs and relation labels separately, improving relation detecting accuracy by exploiting the two-sided nature of temporal relationships. We experimentally demonstrate that our model achieves state-of-the-art performance in temporal relation extraction.

DBSep 11, 2021
Discovering Technology Gaps using the IntSight Knowledge Navigator

Aurpon Gupta, Subhasis Dasgupta, Snehasis Sinha et al.

Knowledge analysis is an important application of knowledge graphs. In this paper, we present a complex knowledge analysis problem that discovers the gaps in the technology areas of interest to an organization. Our knowledge graph is developed on a heterogeneous data management platform. The analysis combines semantic search, graph analytics, and polystore query optimization.

CLApr 18, 2021
News Meets Microblog: Hashtag Annotation via Retriever-Generator

Xiuwen Zheng, Dheeraj Mekala, Amarnath Gupta et al.

Hashtag annotation for microblog posts has been recently formulated as a sequence generation problem to handle emerging hashtags that are unseen in the training set. The state-of-the-art method leverages conversations initiated by posts to enrich contextual information for the short posts. However, it is unrealistic to assume the existence of conversations before the hashtag annotation itself. Therefore, we propose to leverage news articles published before the microblog post to generate hashtags following a Retriever-Generator framework. Extensive experiments on English Twitter datasets demonstrate superior performance and significant advantages of leveraging news articles to generate hashtags.

DBMay 24, 2019
Multi-Model Investigative Exploration of Social Media Data with boutique: A Case Study in Public Health

Junan Guo, Subhasis Dasgupta, Amarnath Gupta

We present our experience with a data science problem in Public Health, where researchers use social media (Twitter) to determine whether the public shows awareness of HIV prevention measures offered by Public Health campaigns. To help the researcher, we develop an investigative exploration system called boutique that allows a user to perform a multi-step visualization and exploration of data through a dashboard interface. Unique features of boutique includes its ability to handle heterogeneous types of data provided by a polystore, and its ability to use computation as part of the investigative exploration process. In this paper, we present the design of the boutique middleware and walk through an investigation process for a real-life problem.

DBMay 20, 2019
Ingesting High-Velocity Streaming Graphs from Social Media Sources

Subhasis Dasgupta, Aditya Bagchi, Amarnath Gupta

Many data science applications like social network analysis use graphs as their primary form of data. However, acquiring graph-structured data from social media presents some interesting challenges. The first challenge is the high data velocity and bursty nature of the social media data. The second challenge is that the complex nature of the data makes the ingestion process expensive. If we want to store the streaming graph data in a graph database, we face a third challenge -- the database is very often unable to sustain the ingestion of high-velocity, high-burst data. We have developed an adaptive buffering mechanism and a graph compression technique that effectively mitigates the problem. A novel aspect of our method is that the adaptive buffering algorithm uses the data rate, the data content as well as the CPU resources of the database machine to determine an optimal data ingestion mechanism. We further show that an ingestion-time graph-compression strategy improves the efficiency of the data ingestion into the database. We have verified the efficacy of our ingestion optimization strategy through extensive experiments.