Arvind Agarwal

h-index74

9papers

1,099citations

Novelty49%

AI Score39

Ranked #80,222 of 194,257 authors (top 41%)#15,234 in CL (top 49%)

9 Papers

30.7CLFeb 8, 2021Code

VeeAlign: Multifaceted Context Representation using Dual Attention for Ontology Alignment

Vivek Iyer, Arvind Agarwal, Harshit Kumar

Ontology Alignment is an important research problem applied to various fields such as data integration, data transfer, data preparation, etc. State-of-the-art (SOTA) Ontology Alignment systems typically use naive domain-dependent approaches with handcrafted rules or domain-specific architectures, making them unscalable and inefficient. In this work, we propose VeeAlign, a Deep Learning based model that uses a novel dual-attention mechanism to compute the contextualized representation of a concept which, in turn, is used to discover alignments. By doing this, not only is our approach able to exploit both syntactic and semantic information encoded in ontologies, it is also, by design, flexible and scalable to different domains with minimal effort. We evaluate our model on four different datasets from different domains and languages, and establish its superiority through these results as well as detailed ablation studies. The code and datasets used are available at https://github.com/Remorax/VeeAlign.

4.9CLAug 12, 2025

LLM driven Text-to-Table Generation through Sub-Tasks Guidance and Iterative Refinement

Rajmohan C, Sarthak Harne, Arvind Agarwal

Transforming unstructured text into structured data is a complex task, requiring semantic understanding, reasoning, and structural comprehension. While Large Language Models (LLMs) offer potential, they often struggle with handling ambiguous or domain-specific data, maintaining table structure, managing long inputs, and addressing numerical reasoning. This paper proposes an efficient system for LLM-driven text-to-table generation that leverages novel prompting techniques. Specifically, the system incorporates two key strategies: breaking down the text-to-table task into manageable, guided sub-tasks and refining the generated tables through iterative self-feedback. We show that this custom task decomposition allows the model to address the problem in a stepwise manner and improves the quality of the generated table. Furthermore, we discuss the benefits and potential risks associated with iterative self-feedback on the generated tables while highlighting the trade-offs between enhanced performance and computational cost. Our methods achieve strong results compared to baselines on two complex text-to-table generation datasets available in the public domain.

6.1AIJun 22, 2021

Towards Automated Evaluation of Explanations in Graph Neural Networks

Vanya BK, Balaji Ganesan, Aniket Saxena et al.

Explaining Graph Neural Networks predictions to end users of AI applications in easily understandable terms remains an unsolved problem. In particular, we do not have well developed methods for automatically evaluating explanations, in ways that are closer to how users consume those explanations. Based on recent application trends and our own experiences in real world problems, we propose automatic evaluation approaches for GNN Explanations.

8.4AIOct 16, 2020

Multifaceted Context Representation using Dual Attention for Ontology Alignment

Vivek Iyer, Arvind Agarwal, Harshit Kumar

Ontology Alignment is an important research problem that finds application in various fields such as data integration, data transfer, data preparation etc. State-of-the-art (SOTA) architectures in Ontology Alignment typically use naive domain-dependent approaches with handcrafted rules and manually assigned values, making them unscalable and inefficient. Deep Learning approaches for ontology alignment use domain-specific architectures that are not only in-extensible to other datasets and domains, but also typically perform worse than rule-based approaches due to various limitations including over-fitting of models, sparsity of datasets etc. In this work, we propose VeeAlign, a Deep Learning based model that uses a dual-attention mechanism to compute the contextualized representation of a concept in order to learn alignments. By doing so, not only does our approach exploit both syntactic and semantic structure of ontologies, it is also, by design, flexible and scalable to different domains with minimal effort. We validate our approach on various datasets from different domains and in multilingual settings, and show its superior performance over SOTA methods.

1.0LGAug 20, 2019

Compliance Change Tracking in Business Process Services

Srikanth G Tamilselvam, Ankush Gupta, Arvind Agarwal

Regulatory compliance is an organization's adherence to laws, regulations, guidelines and specifications relevant to its business. Compliance officers responsible for maintaining adherence constantly struggle to keep up with the large amount of changes in regulatory requirements. Keeping up with the changes entail two main tasks: fetching the regulatory announcements that actually contain changes of interest, and incorporating those changes in the business process. In this paper we focus on the first task, and present a Compliance Change Tracking System, that gathers regulatory announcements from government sites, news sites, email subscriptions; classifies their importance i.e Actionability through a hierarchical classifier, and business process applicability through a multi-class classifier. For these classifiers, we experiment with several approaches such as vanilla classification methods (e.g. Naive Bayes, logistic regression etc.), hierarchical classification methods, rule based approach, hybrid approach with various preprocessing and feature selection methods; and show that despite the richness of other models, a simple hierarchical classification with bag-of-words features works the best for Actionability classifier and multi-class logistic regression works the best for Applicability classifier. The system has been deployed in global delivery centers, and has received positive feedback from payroll compliance officers.

17.4CLSep 15, 2017

A Deep Generative Framework for Paraphrase Generation

Ankush Gupta, Arvind Agarwal, Prawaan Singh et al.

Paraphrase generation is an important problem in NLP, especially in question answering, information retrieval, information extraction, conversation systems, to name a few. In this paper, we address the problem of generating paraphrases automatically. Our proposed method is based on a combination of deep generative models (VAE) with sequence-to-sequence models (LSTM) to generate paraphrases, given an input sentence. Traditional VAEs when combined with recurrent neural networks can generate free text but they are not suitable for paraphrase generation for a given sentence. We address this problem by conditioning the both, encoder and decoder sides of VAE, on the original sentence, so that it can generate the given sentence's paraphrases. Unlike most existing models, our model is simple, modular and can generate multiple paraphrases, for a given sentence. Quantitative evaluation of the proposed method on a benchmark paraphrase dataset demonstrates its efficacy, and its performance improvement over the state-of-the-art methods by a significant margin, whereas qualitative human evaluation indicate that the generated paraphrases are well-formed, grammatically correct, and are relevant to the input sentence. Furthermore, we evaluate our method on a newly released question paraphrase dataset, and establish a new baseline for future research.

11.2CLSep 13, 2017

Dialogue Act Sequence Labeling using Hierarchical encoder with CRF

Harshit Kumar, Arvind Agarwal, Riddhiman Dasgupta et al.

Dialogue Act recognition associate dialogue acts (i.e., semantic labels) to utterances in a conversation. The problem of associating semantic labels to utterances can be treated as a sequence labeling problem. In this work, we build a hierarchical recurrent neural network using bidirectional LSTM as a base unit and the conditional random field (CRF) as the top layer to classify each utterance into its corresponding dialogue act. The hierarchical network learns representations at multiple levels, i.e., word level, utterance level, and conversation level. The conversation level representations are input to the CRF layer, which takes into account not only all previous utterances but also their dialogue acts, thus modeling the dependency among both, labels and utterances, an important consideration of natural dialogue. We validate our approach on two different benchmark data sets, Switchboard and Meeting Recorder Dialogue Act, and show performance improvement over the state-of-the-art methods by $2.2\%$ and $4.1\%$ absolute points, respectively. It is worth noting that the inter-annotator agreement on Switchboard data set is $84\%$, and our method is able to achieve the accuracy of about $79\%$ despite being trained on the noisy data.

4.8IRJun 23, 2016

A Relevant Content Filtering Based Framework For Data Stream Summarization

Cailing Dong, Arvind Agarwal

Social media platforms are a rich source of information these days, however, of all the available information, only a small fraction is of users' interest. To help users catch up with the latest topics of their interests from the large amount of information available in social media, we present a relevant content filtering based framework for data stream summarization. More specifically, given the topic or event of interest, this framework can dynamically discover and filter out relevant information from irrelevant information in the stream of text provided by social media platforms. It then further captures the most representative and up-to-date information to generate a sequential summary or event story line along with the evolution of the topic or event. Our framework does not depend on any labeled data, it instead uses the weak supervision provided by the user, which matches the real scenarios of users searching for information about an ongoing event. We experimented on two real events traced by a Twitter dataset from TREC 2011. The results verified the effectiveness of relevant content filtering and sequential summary generation of the proposed framework. It also shows its robustness of using the most easy-to-obtain weak supervision, i.e., trending topic or hashtag. Thus, this framework can be easily integrated into social media platforms such as Twitter to generate sequential summaries for the events of interest. We also make the manually generated gold-standard sequential summaries of the two test events publicly available for future use in the community.

1.4LGApr 25, 2014

Multitask Learning for Sequence Labeling Tasks

Arvind Agarwal, Saurabh Kataria

In this paper, we present a learning method for sequence labeling tasks in which each example sequence has multiple label sequences. Our method learns multiple models, one model for each label sequence. Each model computes the joint probability of all label sequences given the example sequence. Although each model considers all label sequences, its primary focus is only one label sequence, and therefore, each model becomes a task-specific model, for the task belonging to that primary label. Such multiple models are learned {\it simultaneously} by facilitating the learning transfer among models through {\it explicit parameter sharing}. We experiment the proposed method on two applications and show that our method significantly outperforms the state-of-the-art method.