CVFeb 28, 2023
VQA with Cascade of Self- and Co-Attention BlocksAakansha Mishra, Ashish Anand, Prithwijit Guha
The use of complex attention modules has improved the performance of the Visual Question Answering (VQA) task. This work aims to learn an improved multi-modal representation through dense interaction of visual and textual modalities. The proposed model has an attention block containing both self-attention and co-attention on image and text. The self-attention modules provide the contextual information of objects (for an image) and words (for a question) that are crucial for inferring an answer. On the other hand, co-attention aids the interaction of image and text. Further, fine-grained information is obtained from two modalities by using a Cascade of Self- and Co-Attention blocks (CSCA). This proposal is benchmarked on the widely used VQA2.0 and TDIUC datasets. The efficacy of key components of the model and cascading of attention modules are demonstrated by experiments involving ablation analysis.
CLJan 15Code
AWED-FiNER: Agents, Web applications, and Expert Detectors for Fine-grained Named Entity Recognition across 36 Languages for 6.6 Billion SpeakersPrachuryya Kaushik, Ashish Anand
Named Entity Recognition (NER) is a foundational task in Natural Language Processing (NLP) and Information Retrieval (IR), which facilitates semantic search and structured data extraction. We introduce \textbf{AWED-FiNER}, an open-source collection of agentic tool, web application, and 53 state-of-the-art expert models that provide Fine-grained Named Entity Recognition (FgNER) solutions across 36 languages spoken by more than 6.6 billion people. The agentic tool enables routing multilingual text to specialized expert models to fetch FgNER annotations within seconds. The web-based platform provides a ready-to-use FgNER annotation service for non-technical users. Moreover, the collection of language-specific extremely small open-source state-of-the-art expert models facilitates offline deployment in resource-constrained scenarios, including edge devices. AWED-FiNER covers languages spoken by over 6.6 billion people, ranging from global languages like English, Chinese, Spanish, and Hindi, to low-resource languages like Assamese, Santali, and Odia, along with a specific focus on extremely low-resource vulnerable languages such as Bodo, Manipuri, Bishnupriya, and Mizo. The resources can be accessed here: Agentic Tool (https://github.com/PrachuryyaKaushik/AWED-FiNER), Web Application (https://hf.co/spaces/prachuryyaIITG/AWED-FiNER), and 53 Expert Detector Models (https://hf.co/collections/prachuryyaIITG/awed-finer).
CLNov 21, 2023
Noise in Relation Classification Dataset TACRED: Characterization and ReductionAkshay Parekh, Ashish Anand, Amit Awekar
The overarching objective of this paper is two-fold. First, to explore model-based approaches to characterize the primary cause of the noise. in the RE dataset TACRED Second, to identify the potentially noisy instances. Towards the first objective, we analyze predictions and performance of state-of-the-art (SOTA) models to identify the root cause of noise in the dataset. Our analysis of TACRED shows that the majority of the noise in the dataset originates from the instances labeled as no-relation which are negative examples. For the second objective, we explore two nearest-neighbor-based strategies to automatically identify potentially noisy examples for elimination and reannotation. Our first strategy, referred to as Intrinsic Strategy (IS), is based on the assumption that positive examples are clean. Thus, we have used false-negative predictions to identify noisy negative examples. Whereas, our second approach, referred to as Extrinsic Strategy, is based on using a clean subset of the dataset to identify potentially noisy negative examples. Finally, we retrained the SOTA models on the eliminated and reannotated dataset. Our empirical results based on two SOTA models trained on TACRED-E following the IS show an average 4% F1-score improvement, whereas reannotation (TACRED-R) does not improve the original results. However, following ES, SOTA models show the average F1-score improvement of 3.8% and 4.4% when trained on respective eliminated (TACRED-EN) and reannotated (TACRED-RN) datasets respectively. We further extended the ES for cleaning positive examples as well, which resulted in an average performance improvement of 5.8% and 5.6% for the eliminated (TACRED-ENP) and reannotated (TACRED-RNP) datasets respectively.
CLNov 23, 2021Code
CL-NERIL: A Cross-Lingual Model for NER in Indian LanguagesAkshara Prabhakar, Gouri Sankar Majumder, Ashish Anand
Developing Named Entity Recognition (NER) systems for Indian languages has been a long-standing challenge, mainly owing to the requirement of a large amount of annotated clean training instances. This paper proposes an end-to-end framework for NER for Indian languages in a low-resource setting by exploiting parallel corpora of English and Indian languages and an English NER dataset. The proposed framework includes an annotation projection method that combines word alignment score and NER tag prediction confidence score on source language (English) data to generate weakly labeled data in a target Indian language. We employ a variant of the Teacher-Student model and optimize it jointly on the pseudo labels of the Teacher model and predictions on the generated weakly labeled data. We also present manually annotated test sets for three Indian languages: Hindi, Bengali, and Gujarati. We evaluate the performance of the proposed framework on the test sets of the three Indian languages. Empirical results show a minimum 10% performance improvement compared to the zero-shot transfer learning model on all languages. This indicates that weakly labeled data generated using the proposed annotation projection method in target Indian languages can complement well-annotated source language data to enhance performance. Our code is publicly available at https://github.com/aksh555/CL-NERIL
CRJan 14
Malware Classification using Diluted Convolutional Neural Network with Fast Gradient Sign MethodAshish Anand, Bhupendra Singh, Sunil Khemka et al.
Android malware has become an increasingly critical threat to organizations, society and individuals, posing significant risks to privacy, data security and infrastructure. As malware continues to evolve in terms of complexity and sophistication, the mitigation and detection of these malicious software instances have become more time consuming and challenging particularly due to the requirement of large number of features to identify potential malware. To address these challenges, this research proposes Fast Gradient Sign Method with Diluted Convolutional Neural Network (FGSM DICNN) method for malware classification. DICNN contains diluted convolutions which increases receptive field, enabling the model to capture dispersed malware patterns across long ranges using fewer features without adding parameters. Additionally, the FGSM strategy enhance the accuracy by using one-step perturbations during training that provides more defensive advantage of lower computational cost. This integration helps to manage high classification accuracy while reducing the dependence on extensive feature sets. The proposed FGSM DICNN model attains 99.44% accuracy while outperforming other existing approaches such as Custom Deep Neural Network (DCNN).
CLOct 18, 2025
End-to-End Argument Mining through Autoregressive Argumentative Structure PredictionNilmadhab Das, Vishal Vaibhav, Yash Sunil Choudhary et al.
Argument Mining (AM) helps in automating the extraction of complex argumentative structures such as Argument Components (ACs) like Premise, Claim etc. and Argumentative Relations (ARs) like Support, Attack etc. in an argumentative text. Due to the inherent complexity of reasoning involved with this task, modelling dependencies between ACs and ARs is challenging. Most of the recent approaches formulate this task through a generative paradigm by flattening the argumentative structures. In contrast to that, this study jointly formulates the key tasks of AM in an end-to-end fashion using Autoregressive Argumentative Structure Prediction (AASP) framework. The proposed AASP framework is based on the autoregressive structure prediction framework that has given good performance for several NLP tasks. AASP framework models the argumentative structures as constrained pre-defined sets of actions with the help of a conditional pre-trained language model. These actions build the argumentative structures step-by-step in an autoregressive manner to capture the flow of argumentative reasoning in an efficient way. Extensive experiments conducted on three standard AM benchmarks demonstrate that AASP achieves state-of-theart (SoTA) results across all AM tasks in two benchmarks and delivers strong results in one benchmark.
CLMay 31, 2025
AKReF: An argumentative knowledge representation framework for structured argumentationDebarati Bhattacharjee, Ashish Anand
This paper presents a framework to convert argumentative texts into argument knowledge graphs (AKG). The proposed argumentative knowledge representation framework (AKReF) extends the theoretical foundation and enables the AKG to provide a graphical view of the argumentative structure that is easier to understand. Starting with basic annotations of argumentative components (ACs) and argumentative relations (ARs), we enrich the information by constructing a knowledge base (KB) graph with metadata attributes for nodes. Next, we apply modus ponens on premises and inference rules from the KB to form arguments. From these arguments, we create an AKG. The nodes and edges of the AKG have attributes capturing key argumentative features such as the type of premise (e.g., axiom, ordinary premise, assumption), the type of inference rule (e.g., strict, defeasible), preference order over defeasible rules, markers (e.g., "therefore", "however"), and the type of attack (e.g., undercut, rebuttal, undermining). We identify inference rules by locating a specific set of markers, called inference markers (IM). This, in turn, makes it possible to identify undercut attacks previously undetectable in existing datasets. AKG prepares the ground for reasoning tasks, including checking the coherence of arguments and identifying opportunities for revision. For this, it is essential to find indirect relations, many of which are implicit. Our proposed AKG format, with annotated inference rules and modus ponens, helps reasoning models learn the implicit, indirect relations that require inference over arguments and their interconnections. We use an essay from the AAEC dataset to illustrate the framework. We further show its application in complex analyses such as extracting a conflict-free set and a maximal set of admissible arguments.
CLJun 12, 2024
Exploration of Marker-Based Approaches in Argument Mining through Augmented Natural LanguageNilmadhab Das, Vishal Choudhary, V. Vijaya Saradhi et al.
Argument Mining (AM) involves identifying and extracting Argumentative Components (ACs) and their corresponding Argumentative Relations (ARs). Most of the prior works have broken down these tasks into multiple sub-tasks. Existing end-to-end setups primarily use the dependency parsing approach. This work introduces a generative paradigm-based end-to-end framework argTANL. argTANL frames the argumentative structures into label-augmented text, called Augmented Natural Language (ANL). This framework jointly extracts both ACs and ARs from a given argumentative text. Additionally, this study explores the impact of Argumentative and Discourse markers on enhancing the model's performance within the proposed framework. Two distinct frameworks, Marker-Enhanced argTANL (ME-argTANL) and argTANL with specialized Marker-Based Fine-Tuning, are proposed to achieve this. Extensive experiments are conducted on three standard AM benchmarks to demonstrate the superior performance of the ME-argTANL.
CLDec 26, 2021
Budget Sensitive Reannotation of Noisy Relation Classification Data Using Label HierarchyAkshay Parekh, Ashish Anand, Amit Awekar
Large crowd-sourced datasets are often noisy and relation classification (RC) datasets are no exception. Reannotating the entire dataset is one probable solution however it is not always viable due to time and budget constraints. This paper addresses the problem of efficient reannotation of a large noisy dataset for the RC. Our goal is to catch more annotation errors in the dataset while reannotating fewer instances. Existing work on RC dataset reannotation lacks the flexibility about how much data to reannotate. We introduce the concept of a reannotation budget to overcome this limitation. The immediate follow-up problem is: Given a specific reannotation budget, which subset of the data should we reannotate? To address this problem, we present two strategies to selectively reannotate RC datasets. Our strategies utilize the taxonomic hierarchy of relation labels. The intuition of our work is to rely on the graph distance between actual and predicted relation labels in the label hierarchy graph. We evaluate our reannotation strategies on the well-known TACRED dataset. We design our experiments to answer three specific research questions. First, does our strategy select novel candidates for reannotation? Second, for a given reannotation budget is our reannotation strategy more efficient at catching annotation errors? Third, what is the impact of data reannotation on RC model performance measurement? Experimental results show that our both reannotation strategies are novel and efficient. Our analysis indicates that the current reported performance of RC models on noisy TACRED data is inflated.
CVJul 8, 2020
IQ-VQA: Intelligent Visual Question AnsweringVatsal Goel, Mohit Chandak, Ashish Anand et al.
Even though there has been tremendous progress in the field of Visual Question Answering, models today still tend to be inconsistent and brittle. To this end, we propose a model-independent cyclic framework which increases consistency and robustness of any VQA architecture. We train our models to answer the original question, generate an implication based on the answer and then also learn to answer the generated implication correctly. As a part of the cyclic framework, we propose a novel implication generator which can generate implied questions from any question-answer pair. As a baseline for future works on consistency, we provide a new human annotated VQA-Implications dataset. The dataset consists of ~30k questions containing implications of 3 types - Logical Equivalence, Necessary Condition and Mutual Exclusion - made from the VQA v2.0 validation dataset. We show that our framework improves consistency of VQA models by ~15% on the rule-based dataset, ~7% on VQA-Implications dataset and robustness by ~2%, without degrading their performance. In addition, we also quantitatively show improvement in attention maps which highlights better multi-modal understanding of vision and language.
CVFeb 17, 2020
CQ-VQA: Visual Question Answering on Categorized QuestionsAakansha Mishra, Ashish Anand, Prithwijit Guha
This paper proposes CQ-VQA, a novel 2-level hierarchical but end-to-end model to solve the task of visual question answering (VQA). The first level of CQ-VQA, referred to as question categorizer (QC), classifies questions to reduce the potential answer search space. The QC uses attended and fused features of the input question and image. The second level, referred to as answer predictor (AP), comprises of a set of distinct classifiers corresponding to each question category. Depending on the question category predicted by QC, only one of the classifiers of AP remains active. The loss functions of QC and AP are aggregated together to make it an end-to-end model. The proposed model (CQ-VQA) is evaluated on the TDIUC dataset and is benchmarked against state-of-the-art approaches. Results indicate competitive or better performance of CQ-VQA.
CLSep 13, 2019
Taxonomical hierarchy of canonicalized relations from multiple Knowledge BasesAkshay Parekh, Ashish Anand, Amit Awekar
This work addresses two important questions pertinent to Relation Extraction (RE). First, what are all possible relations that could exist between any two given entity types? Second, how do we define an unambiguous taxonomical (is-a) hierarchy among the identified relations? To address the first question, we use three resources Wikipedia Infobox, Wikidata, and DBpedia. This study focuses on relations between person, organization and location entity types. We exploit Wikidata and DBpedia in a data-driven manner, and Wikipedia Infobox templates manually to generate lists of relations. Further, to address the second question, we canonicalize, filter, and combine the identified relations from the three resources to construct a taxonomical hierarchy. This hierarchy contains 623 canonical relations with highest contribution from Wikipedia Infobox followed by DBpedia and Wikidata. The generated relation list subsumes an average of 85% of relations from RE datasets when entity types are restricted.
GNJun 7, 2019
Unsupervised Representation Learning of DNA SequencesVishal Agarwal, N Jayanth Kumar Reddy, Ashish Anand
Recently several deep learning models have been used for DNA sequence based classification tasks. Often such tasks require long and variable length DNA sequences in the input. In this work, we use a sequence-to-sequence autoencoder model to learn a latent representation of a fixed dimension for long and variable length DNA sequences in an unsupervised manner. We evaluate both quantitatively and qualitatively the learned latent representation for a supervised task of splice site classification. The quantitative evaluation is done under two different settings. Our experiments show that these representations can be used as features or priors in closely related tasks such as splice site classification. Further, in our qualitative analysis, we use a model attribution technique Integrated Gradients to infer significant sequence signatures influencing the classification accuracy. We show the identified splice signatures resemble well with the existing knowledge.
CLApr 30, 2019
Fine-grained Entity Recognition with Reduced False Negatives and Large Type CoverageAbhishek Abhishek, Sanya Bathla Taneja, Garima Malik et al.
Fine-grained Entity Recognition (FgER) is the task of detecting and classifying entity mentions to a large set of types spanning diverse domains such as biomedical, finance and sports. We observe that when the type set spans several domains, detection of entity mention becomes a limitation for supervised learning models. The primary reason being lack of dataset where entity boundaries are properly annotated while covering a large spectrum of entity types. Our work directly addresses this issue. We propose Heuristics Allied with Distant Supervision (HAnDS) framework to automatically construct a quality dataset suitable for the FgER task. HAnDS framework exploits the high interlink among Wikipedia and Freebase in a pipelined manner, reducing annotation errors introduced by naively using distant supervision approach. Using HAnDS framework, we create two datasets, one suitable for building FgER systems recognizing up to 118 entity types based on the FIGER type hierarchy and another for up to 1115 entity types based on the TypeNet hierarchy. Our extensive empirical experimentation warrants the quality of the generated datasets. Along with this, we also provide a manually annotated dataset for benchmarking FgER systems.
CLOct 20, 2018
Collective Learning From Diverse Datasets for Entity Typing in the WildAbhishek Abhishek, Amar Prakash Azad, Balaji Ganesan et al.
Entity typing (ET) is the problem of assigning labels to given entity mentions in a sentence. Existing works for ET require knowledge about the domain and target label set for a given test instance. ET in the absence of such knowledge is a novel problem that we address as ET in the wild. We hypothesize that the solution to this problem is to build supervised models that generalize better on the ET task as a whole, rather than a specific dataset. In this direction, we propose a Collective Learning Framework (CLF), which enables learning from diverse datasets in a unified way. The CLF first creates a unified hierarchical label set (UHLS) and a label mapping by aggregating label information from all available datasets. Then it builds a single neural network classifier using UHLS, label mapping, and a partial loss function. The single classifier predicts the finest possible label across all available domains even though these labels may not be present in any domain-specific dataset. We also propose a set of evaluation schemes and metrics to evaluate the performance of models in this novel problem. Extensive experimentation on seven diverse real-world datasets demonstrates the efficacy of our CLF.
CLSep 3, 2017
Investigating how well contextual features are captured by bi-directional recurrent neural network modelsKushal Chawla, Sunil Kumar Sahu, Ashish Anand
Learning algorithms for natural language processing (NLP) tasks traditionally rely on manually defined relevant contextual features. On the other hand, neural network models using an only distributional representation of words have been successfully applied for several NLP tasks. Such models learn features automatically and avoid explicit feature engineering. Across several domains, neural models become a natural choice specifically when limited characteristics of data are known. However, this flexibility comes at the cost of interpretability. In this paper, we define three different methods to investigate ability of bi-directional recurrent neural networks (RNNs) in capturing contextual features. In particular, we analyze RNNs for sequence tagging tasks. We perform a comprehensive analysis on general as well as biomedical domain datasets. Our experiments focus on important contextual words as features, which can easily be extended to analyze various other feature types. We also investigate positional effects of context words and show how the developed methods can be used for error analysis.
CLAug 11, 2017
Unified Neural Architecture for Drug, Disease and Clinical Entity RecognitionSunil Kumar Sahu, Ashish Anand
Most existing methods for biomedical entity recognition task rely on explicit feature engineering where many features either are specific to a particular task or depends on output of other existing NLP tools. Neural architectures have been shown across various domains that efforts for explicit feature design can be reduced. In this work we propose an unified framework using bi-directional long short term memory network (BLSTM) for named entity recognition (NER) tasks in biomedical and clinical domains. Three important characteristics of the framework are as follows - (1) model learns contextual as well as morphological features using two different BLSTM in hierarchy, (2) model uses first order linear conditional random field (CRF) in its output layer in cascade of BLSTM to infer label or tag sequence, (3) model does not use any domain specific features or dictionary, i.e., in another words, same set of features are used in the three NER tasks, namely, disease name recognition (Disease NER), drug name recognition (Drug NER) and clinical entity recognition (Clinical NER). We compare performance of the proposed model with existing state-of-the-art models on the standard benchmark datasets of the three tasks. We show empirically that the proposed framework outperforms all existing models. Further our analysis of CRF layer and word-embedding obtained using character based embedding show their importance.
CLAug 11, 2017
What matters in a transferable neural network model for relation classification in the biomedical domain?Sunil Kumar Sahu, Ashish Anand
Lack of sufficient labeled data often limits the applicability of advanced machine learning algorithms to real life problems. However efficient use of Transfer Learning (TL) has been shown to be very useful across domains. TL utilizes valuable knowledge learned in one task (source task), where sufficient data is available, to the task of interest (target task). In biomedical and clinical domain, it is quite common that lack of sufficient training data do not allow to fully exploit machine learning models. In this work, we present two unified recurrent neural models leading to three transfer learning frameworks for relation classification tasks. We systematically investigate effectiveness of the proposed frameworks in transferring the knowledge under multiple aspects related to source and target tasks, such as, similarity or relatedness between source and target tasks, and size of training data for source task. Our empirical results show that the proposed frameworks in general improve the model performance, however these improvements do depend on aspects related to source and target tasks. This dependence then finally determine the choice of a particular TL framework.
CLMay 26, 2017
Biomedical Event Trigger Identification Using Bidirectional Recurrent Neural Network Based ModelsPatchigolla V S S Rahul, Sunil Kumar Sahu, Ashish Anand
Biomedical events describe complex interactions between various biomedical entities. Event trigger is a word or a phrase which typically signifies the occurrence of an event. Event trigger identification is an important first step in all event extraction methods. However many of the current approaches either rely on complex hand-crafted features or consider features only within a window. In this paper we propose a method that takes the advantage of recurrent neural network (RNN) to extract higher level features present across the sentence. Thus hidden state representation of RNN along with word and entity type embedding as features avoid relying on the complex hand-crafted features generated using various NLP toolkits. Our experiments have shown to achieve state-of-art F1-score on Multi Level Event Extraction (MLEE) corpus. We have also performed category-wise analysis of the result and discussed the importance of various features in trigger identification task.
CLMay 15, 2017
Representation learning of drug and disease terms for drug repositioningSahil Manchanda, Ashish Anand
Drug repositioning (DR) refers to identification of novel indications for the approved drugs. The requirement of huge investment of time as well as money and risk of failure in clinical trials have led to surge in interest in drug repositioning. DR exploits two major aspects associated with drugs and diseases: existence of similarity among drugs and among diseases due to their shared involved genes or pathways or common biological effects. Existing methods of identifying drug-disease association majorly rely on the information available in the structured databases only. On the other hand, abundant information available in form of free texts in biomedical research articles are not being fully exploited. Word-embedding or obtaining vector representation of words from a large corpora of free texts using neural network methods have been shown to give significant performance for several natural language processing tasks. In this work we propose a novel way of representation learning to obtain features of drugs and diseases by combining complementary information available in unstructured texts and structured datasets. Next we use matrix completion approach on these feature vectors to learn projection matrix between drug and disease vector spaces. The proposed method has shown competitive performance with state-of-the-art methods. Further, the case studies on Alzheimer's and Hypertension diseases have shown that the predicted associations are matching with the existing knowledge.
CLFeb 22, 2017
Fine-Grained Entity Type Classification by Jointly Learning Representations and Label EmbeddingsAbhishek, Ashish Anand, Amit Awekar
Fine-grained entity type classification (FETC) is the task of classifying an entity mention to a broad set of types. Distant supervision paradigm is extensively used to generate training data for this task. However, generated training data assigns same set of labels to every mention of an entity without considering its local context. Existing FETC systems have two major drawbacks: assuming training data to be noise free and use of hand crafted features. Our work overcomes both drawbacks. We propose a neural network model that jointly learns entity mentions and their context representation to eliminate use of hand crafted features. Our model treats training data as noisy and uses non-parametric variant of hinge loss function. Experiments show that the proposed model outperforms previous state-of-the-art methods on two publicly available datasets, namely FIGER (GOLD) and BBN with an average relative improvement of 2.69% in micro-F1 score. Knowledge learnt by our model on one dataset can be transferred to other datasets while using same model or other FETC systems. These approaches of transferring knowledge further improve the performance of respective models.
CLJan 28, 2017
Drug-Drug Interaction Extraction from Biomedical Text Using Long Short Term Memory NetworkSunil Kumar Sahu, Ashish Anand
Simultaneous administration of multiple drugs can have synergistic or antagonistic effects as one drug can affect activities of other drugs. Synergistic effects lead to improved therapeutic outcomes, whereas, antagonistic effects can be life-threatening, may lead to increased healthcare cost, or may even cause death. Thus identification of unknown drug-drug interaction (DDI) is an important concern for efficient and effective healthcare. Although multiple resources for DDI exist, they are often unable to keep pace with rich amount of information available in fast growing biomedical texts. Most existing methods model DDI extraction from text as a classification problem and mainly rely on handcrafted features. Some of these features further depend on domain specific tools. Recently neural network models using latent features have been shown to give similar or better performance than the other existing models dependent on handcrafted features. In this paper, we present three models namely, {\it B-LSTM}, {\it AB-LSTM} and {\it Joint AB-LSTM} based on long short-term memory (LSTM) network. All three models utilize word and position embedding as latent features and thus do not rely on explicit feature engineering. Further use of bidirectional long short-term memory (Bi-LSTM) networks allow implicit feature extraction from the whole sentence. The two models, {\it AB-LSTM} and {\it Joint AB-LSTM} also use attentive pooling in the output of Bi-LSTM layer to assign weights to features. Our experimental results on the SemEval-2013 DDI extraction dataset show that the {\it Joint AB-LSTM} model outperforms all the existing methods, including those relying on handcrafted features. The other two proposed LSTM models also perform competitively with state-of-the-art methods.
CLJun 30, 2016
Recurrent neural network models for disease name recognition using domain invariant featuresSunil Kumar Sahu, Ashish Anand
Hand-crafted features based on linguistic and domain-knowledge play crucial role in determining the performance of disease name recognition systems. Such methods are further limited by the scope of these features or in other words, their ability to cover the contexts or word dependencies within a sentence. In this work, we focus on reducing such dependencies and propose a domain-invariant framework for the disease name recognition task. In particular, we propose various end-to-end recurrent neural network (RNN) models for the tasks of disease name recognition and their classification into four pre-defined categories. We also utilize convolution neural network (CNN) in cascade of RNN to get character-based embedded features and employ it with word-embedded features in our model. We compare our models with the state-of-the-art results for the two tasks on NCBI disease dataset. Our results for the disease mention recognition task indicate that state-of-the-art performance can be obtained without relying on feature engineering. Further the proposed models obtained improved performance on the classification task of disease names.
CLJun 30, 2016
Relation extraction from clinical texts using domain invariant convolutional neural networkSunil Kumar Sahu, Ashish Anand, Krishnadev Oruganty et al.
In recent years extracting relevant information from biomedical and clinical texts such as research articles, discharge summaries, or electronic health records have been a subject of many research efforts and shared challenges. Relation extraction is the process of detecting and classifying the semantic relation among entities in a given piece of texts. Existing models for this task in biomedical domain use either manually engineered features or kernel methods to create feature vector. These features are then fed to classifier for the prediction of the correct class. It turns out that the results of these methods are highly dependent on quality of user designed features and also suffer from curse of dimensionality. In this work we focus on extracting relations from clinical discharge summaries. Our main objective is to exploit the power of convolution neural network (CNN) to learn features automatically and thus reduce the dependency on manual feature engineering. We evaluate performance of the proposed model on i2b2-2010 clinical relation extraction challenge dataset. Our results indicate that convolution neural network can be a good model for relation exaction in clinical text without being dependent on expert's knowledge on defining quality features.
IROct 11, 2015
Inferring disease correlation from healthcare dataGargi Priyadarshini, Ashish Anand
Electronic Health Records maintained in health care settings are a potential source of substantial clinical knowledge. The massive volume of data, unstructured nature of records and obligatory requirement of domain acquaintance together pose a challenge in knowledge extraction from it. The aim of this study is to overcome this challenge with a methodical analysis, abstraction and summarization of such data. This is an attempt to explain clinical observations through bio-medical and genomic data. Discharge summaries of obesity patients were processed to extract coherent patterns. This was supported by Machine Learning and Natural Language Processing based technologies and concept mapping tool along with biomedical, clinical and genomic knowledge bases. Semantic relations between diseases were extracted and filtered through Chi square test to remove spurious relations. The remaining relations were validated against biomedical literature and gene interaction networks. A collection of binary relations of diseases was derived from the data. One set implied co-morbidity while the other set contained diseases which are risk factors of others. Validation against bio-medical literature increased the prospect of correlation between diseases. Gene interaction network revealed that the diseases are related and their corresponding genes are in close proximity. Conclusion: This study focuses on deducing meaningful relations between diseases from discharge summaries. For analytical purpose, the scope has been limited to a few common, well-researched diseases. It can be extended to incorporate relatively unknown, complex diseases and discover new traits to help in clinical assessments.