Deepak Gupta

CL
h-index36
48papers
5,955citations
Novelty40%
AI Score58

48 Papers

LGJun 13, 2023Code
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

Arnav Chavan, Zhuang Liu, Deepak Gupta et al.

We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, providing more flexibility and capability across diverse tasks and datasets. Moreover, GLoRA facilitates efficient parameter adaptation by employing a scalable, modular, layer-wise structure search that learns individual adapter of each layer. Originating from a unified mathematical formulation, GLoRA exhibits strong transfer learning, few-shot learning and domain generalization abilities, as it adapts to new tasks through not only weights but also additional dimensions like activations. Comprehensive experiments demonstrate that GLoRA outperforms all previous methods in natural, specialized, and structured vision benchmarks, achieving superior accuracy with fewer parameters and computations. The proposed method on LLaMA-1 and LLaMA-2 also show considerable enhancements compared to the original LoRA in the language domain. Furthermore, our structural re-parameterization design ensures that GLoRA incurs no extra inference cost, rendering it a practical solution for resource-limited applications. Code and models are available at: https://github.com/Arnav0400/ViT-Slim/tree/master/GLoRA.

CVOct 5, 2022Code
Medical Image Retrieval via Nearest Neighbor Search on Pre-trained Image Features

Deepak Gupta, Russell Loane, Soumya Gayen et al.

Nearest neighbor search (NNS) aims to locate the points in high-dimensional space that is closest to the query point. The brute-force approach for finding the nearest neighbor becomes computationally infeasible when the number of points is large. The NNS has multiple applications in medicine, such as searching large medical imaging databases, disease classification, diagnosis, etc. With a focus on medical imaging, this paper proposes DenseLinkSearch an effective and efficient algorithm that searches and retrieves the relevant images from heterogeneous sources of medical images. Towards this, given a medical database, the proposed algorithm builds the index that consists of pre-computed links of each point in the database. The search algorithm utilizes the index to efficiently traverse the database in search of the nearest neighbor. We extensively tested the proposed NNS approach and compared the performance with state-of-the-art NNS approaches on benchmark datasets and our created medical image datasets. The proposed approach outperformed the existing approach in terms of retrieving accurate neighbors and retrieval speed. We also explore the role of medical image feature representation in content-based medical image retrieval tasks. We propose a Transformer-based feature representation technique that outperformed the existing pre-trained Transformer approach on CLEF 2011 medical image retrieval task. The source code of our experiments are available at https://github.com/deepaknlp/DLS.

LGJun 1
DOT-MoE: Differentiable Optimal Transport for MoEfication

Udbhav Bamba, Arnav Chavan, Aryamaan Thakur et al.

The scaling of Large Language Models (LLMs) has driven significant performance gains but created substantial challenges in inference efficiency. While Mixture of Experts (MoEs) architectures address this by decoupling model size from inference cost, training MoEs from scratch is often unstable and compute intensive. Conversion of pre-trained dense models into sparse MoEs has emerged as an alternative solution; however, existing methods typically rely on heuristic neuron clustering or random splitting to partition the Feed-Forward Network (FFN) into experts. In this work, we propose DOT-MoE, a novel framework that formulates the decomposition of dense layers as a Differentiable Optimal Transport (DOT) problem. Instead of static heuristics, we model neuron assignment as a balanced transport problem, utilizing differentiable Sinkhorn-Knopp iterations to enforce strict expert capacity constraints. Furthermore, we utilize Straight-Through Estimators (STE) to jointly learn the discrete neuron-to-expert assignment and the token-to-expert routing policy end-to-end. Extensive experiments across multiple architectures and benchmarks demonstrate that DOT-MoE significantly outperforms structured pruning, heuristic clustering, and random-split baselines, retaining 90% of the original dense model's performance while reducing active parameters by 50%.

CLJun 14, 2022
CHQ-Summ: A Dataset for Consumer Healthcare Question Summarization

Shweta Yadav, Deepak Gupta, Dina Demner-Fushman

The quest for seeking health information has swamped the web with consumers' health-related questions. Generally, consumers use overly descriptive and peripheral information to express their medical condition or other healthcare needs, contributing to the challenges of natural language understanding. One way to address this challenge is to summarize the questions and distill the key information of the original question. To address this issue, we introduce a new dataset, CHQ-Summ that contains 1507 domain-expert annotated consumer health questions and corresponding summaries. The dataset is derived from the community question-answering forum and therefore provides a valuable resource for understanding consumer health-related posts on social media. We benchmark the dataset on multiple state-of-the-art summarization models to show the effectiveness of the dataset.

LGMar 9, 2023Code
Aux-Drop: Handling Haphazard Inputs in Online Learning Using Auxiliary Dropouts

Rohit Agarwal, Deepak Gupta, Alexander Horsch et al.

Many real-world applications based on online learning produce streaming data that is haphazard in nature, i.e., contains missing features, features becoming obsolete in time, the appearance of new features at later points in time and a lack of clarity on the total number of input features. These challenges make it hard to build a learnable system for such applications, and almost no work exists in deep learning that addresses this issue. In this paper, we present Aux-Drop, an auxiliary dropout regularization strategy for online learning that handles the haphazard input features in an effective manner. Aux-Drop adapts the conventional dropout regularization scheme for the haphazard input feature space ensuring that the final output is minimally impacted by the chaotic appearance of such features. It helps to prevent the co-adaptation of especially the auxiliary and base features, as well as reduces the strong dependence of the output on any of the auxiliary inputs of the model. This helps in better learning for scenarios where certain features disappear in time or when new features are to be modelled. The efficacy of Aux-Drop has been demonstrated through extensive numerical experiments on SOTA benchmarking datasets that include Italy Power Demand, HIGGS, SUSY and multiple UCI datasets. The code is available at https://github.com/Rohit102497/Aux-Drop.

IRNov 15, 2022
Machine Learning enabled models for YouTube Ranking Mechanism and Views Prediction

Vandit Gupta, Akshit Diwan, Chaitanya Chadha et al.

With the continuous increase of internet usage in todays time, everyone is influenced by this source of the power of technology. Due to this, the rise of applications and games Is unstoppable. A major percentage of our population uses these applications for multiple purposes. These range from education, communication, news, entertainment, and many more. Out of this, the application that is making sure that the world stays in touch with each other and with current affairs is social media. Social media applications have seen a boom in the last 10 years with the introduction of smartphones and the internet being available at affordable prices. Applications like Twitch and Youtube are some of the best platforms for producing content and expressing their talent as well. It is the goal of every content creator to post the best and most reliable content so that they can gain recognition. It is important to know the methods of achieving popularity easily, which is what this paper proposes to bring to the spotlight. There should be certain parameters based on which the reach of content could be multiplied by a good factor. The proposed research work aims to identify and estimate the reach, popularity, and views of a YouTube video by using certain features using machine learning and AI techniques. A ranking system would also be used keeping the trending videos in consideration. This would eventually help the content creator know how authentic their content is and healthy competition to make better content before uploading the video on the platform will be ensured.

CLSep 21, 2023
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches

Deepak Gupta, Kush Attal, Dina Demner-Fushman

The increase in the availability of online videos has transformed the way we access information and knowledge. A growing number of individuals now prefer instructional videos as they offer a series of step-by-step procedures to accomplish particular tasks. The instructional videos from the medical domain may provide the best possible visual answers to first aid, medical emergency, and medical education questions. Toward this, this paper is focused on answering health-related questions asked by the public by providing visual answers from medical videos. The scarcity of large-scale datasets in the medical domain is a key challenge that hinders the development of applications that can help the public with their health-related questions. To address this issue, we first proposed a pipelined approach to create two large-scale datasets: HealthVidQA-CRF and HealthVidQA-Prompt. Later, we proposed monomodal and multimodal approaches that can effectively provide visual answers from medical videos to natural language questions. We conducted a comprehensive analysis of the results, focusing on the impact of the created datasets on model training and the significance of visual features in enhancing the performance of the monomodal and multi-modal approaches. Our findings suggest that these datasets have the potential to enhance the performance of medical visual answer localization tasks and provide a promising future direction to further enhance the performance by using pre-trained language-vision models.

LGNov 15, 2022
Classifying text using machine learning models and determining conversation drift

Chaitanya Chadha, Vandit Gupta, Deepak Gupta et al.

Text classification helps analyse texts for semantic meaning and relevance, by mapping the words against this hierarchy. An analysis of various types of texts is invaluable to understanding both their semantic meaning, as well as their relevance. Text classification is a method of categorising documents. It combines computer text classification and natural language processing to analyse text in aggregate. This method provides a descriptive categorization of the text, with features like content type, object field, lexical characteristics, and style traits. In this research, the authors aim to use natural language feature extraction methods in machine learning which are then used to train some of the basic machine learning models like Naive Bayes, Logistic Regression, and Support Vector Machine. These models are used to detect when a teacher must get involved in a discussion when the lines go off-topic.

LGSep 15, 2023
VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference

Soumya Banerjee, Vinay K. Verma, Avideep Mukherjee et al.

Lifelong learning or continual learning is the problem of training an AI agent continuously while also preventing it from forgetting its previously acquired knowledge. Streaming lifelong learning is a challenging setting of lifelong learning with the goal of continuous learning in a dynamic non-stationary environment without forgetting. We introduce a novel approach to lifelong learning, which is streaming (observes each training example only once), requires a single pass over the data, can learn in a class-incremental manner, and can be evaluated on-the-fly (anytime inference). To accomplish these, we propose a novel \emph{virtual gradients} based approach for continual representation learning which adapts to each new example while also generalizing well on past data to prevent catastrophic forgetting. Our approach also leverages an exponential-moving-average-based semantic memory to further enhance performance. Experiments on diverse datasets with temporally correlated observations demonstrate our method's efficacy and superior performance over existing methods.

CVJun 1, 2023
Large Scale Generative Multimodal Attribute Extraction for E-commerce Attributes

Anant Khandelwal, Happy Mittal, Shreyas Sunil Kulkarni et al.

E-commerce websites (e.g. Amazon) have a plethora of structured and unstructured information (text and images) present on the product pages. Sellers often either don't label or mislabel values of the attributes (e.g. color, size etc.) for their products. Automatically identifying these attribute values from an eCommerce product page that contains both text and images is a challenging task, especially when the attribute value is not explicitly mentioned in the catalog. In this paper, we present a scalable solution for this problem where we pose attribute extraction problem as a question-answering task, which we solve using \textbf{MXT}, consisting of three key components: (i) \textbf{M}AG (Multimodal Adaptation Gate), (ii) \textbf{X}ception network, and (iii) \textbf{T}5 encoder-decoder. Our system consists of a generative model that \emph{generates} attribute-values for a given product by using both textual and visual characteristics (e.g. images) of the product. We show that our system is capable of handling zero-shot attribute prediction (when attribute value is not seen in training data) and value-absent prediction (when attribute value is not mentioned in the text) which are missing in traditional classification-based and NER-based models respectively. We have trained our models using distant supervision, removing dependency on human labeling, thus making them practical for real-world applications. With this framework, we are able to train a single model for 1000s of (product-type, attribute) pairs, thus reducing the overhead of training and maintaining separate models. Extensive experiments on two real world datasets show that our framework improves the absolute recall@90P by 10.16\% and 6.9\% from the existing state of the art models. In a popular e-commerce store, we have deployed our models for 1000s of (product-type, attribute) pairs.

LGFeb 16
S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations

Arnav Chavan, Nahush Lele, Udbhav Bamba et al. · amazon-science

Activation outliers in large-scale transformer models pose a fundamental challenge to model quantization, creating excessively large ranges that cause severe accuracy drops during quantization. We empirically observe that outlier severity intensifies with pre-training scale (e.g., progressing from CLIP to the more extensively trained SigLIP and SigLIP2). Through theoretical analysis as well as empirical correlation studies, we establish the direct link between these activation outliers and dominant singular values of the weights. Building on this insight, we propose Selective Spectral Decay ($S^2D$), a geometrically-principled conditioning method that surgically regularizes only the weight components corresponding to the largest singular values during fine-tuning. Through extensive experiments, we demonstrate that $S^2D$ significantly reduces activation outliers and produces well-conditioned representations that are inherently quantization-friendly. Models trained with $S^2D$ achieve up to 7% improved PTQ accuracy on ImageNet under W4A4 quantization and 4% gains when combined with QAT. These improvements also generalize across downstream tasks and vision-language models, enabling the scaling of increasingly large and rigorously trained models without sacrificing deployment efficiency.

CLNov 14, 2022
Learning to Answer Multilingual and Code-Mixed Questions

Deepak Gupta

Question-answering (QA) that comes naturally to humans is a critical component in seamless human-computer interaction. It has emerged as one of the most convenient and natural methods to interact with the web and is especially desirable in voice-controlled environments. Despite being one of the oldest research areas, the current QA system faces the critical challenge of handling multilingual queries. To build an Artificial Intelligent (AI) agent that can serve multilingual end users, a QA system is required to be language versatile and tailored to suit the multilingual environment. Recent advances in QA models have enabled surpassing human performance primarily due to the availability of a sizable amount of high-quality datasets. However, the majority of such annotated datasets are expensive to create and are only confined to the English language, making it challenging to acknowledge progress in foreign languages. Therefore, to measure a similar improvement in the multilingual QA system, it is necessary to invest in high-quality multilingual evaluation benchmarks. In this dissertation, we focus on advancing QA techniques for handling end-user queries in multilingual environments. This dissertation consists of two parts. In the first part, we explore multilingualism and a new dimension of multilingualism referred to as code-mixing. Second, we propose a technique to solve the task of multi-hop question generation by exploiting multiple documents. Experiments show our models achieve state-of-the-art performance on answer extraction, ranking, and generation tasks on multiple domains of MQA, VQA, and language generation. The proposed techniques are generic and can be widely used in various domains and languages to advance QA systems.

LGFeb 2, 2024Code
Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward

Arnav Chavan, Raghav Magazine, Shubham Kushwaha et al.

Despite the impressive performance of LLMs, their widespread adoption faces challenges due to substantial computational and memory requirements during inference. Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference. This survey offers an overview of these methods, emphasizing recent developments. Through experiments on LLaMA(/2)-7B, we evaluate various compression techniques, providing practical insights for efficient LLM deployment in a unified setting. The empirical analysis on LLaMA(/2)-7B highlights the effectiveness of these methods. Drawing from survey insights, we identify current limitations and discuss potential future directions to improve LLM inference efficiency. We release the codebase to reproduce the results presented in this paper at https://github.com/nyunAI/Faster-LLM-Survey

QMAug 21, 2024
Bioimpedance a Diagnostic Tool for Tobacco Induced Oral Lesions: a Mixed Model cross-sectional study

Vaibhav Gupta, Poonam Goel, Usha Agrawal et al.

Introduction: Electrical impedance spectroscopy (EIS) has recently developed as a novel diagnostic device for screening and evaluating cervical dysplasia, prostate cancer, breast cancer and basal cell carcinoma. The current study aimed to validate and evaluate bioimpedance as a diagnostic tool for tobacco-induced oral lesions. Methodology: The study comprised 50 OSCC and OPMD tissue specimens for in-vitro study and 320 subjects for in vivo study. Bioimpedance device prepared and calibrated. EIS measurements were done for the habit and control groups and were compared. Results: The impedance value in the control group was significantly higher compared to the OPMD and OSCC groups. Diagnosis based on BIS measurements has a sensitivity of 95.9% and a specificity of 86.7%. Conclusion: Bioimpedance device can help in decision-making for differentiating OPMD and OSCC cases and their management, especially in primary healthcare settings. Keywords: Impedance, Cancer, Diagnosis, Device, Community

CLMay 17, 2024Code
Surgical Feature-Space Decomposition of LLMs: Why, When and How?

Arnav Chavan, Nahush Lele, Deepak Gupta

Low-rank approximations, of the weight and feature space can enhance the performance of deep learning models, whether in terms of improving generalization or reducing the latency of inference. However, there is no clear consensus yet on \emph{how}, \emph{when} and \emph{why} these approximations are helpful for large language models (LLMs). In this work, we empirically study the efficacy of weight and feature space decomposition in transformer-based LLMs. We demonstrate that surgical decomposition not only provides critical insights into the trade-off between compression and language modelling performance, but also sometimes enhances commonsense reasoning performance of LLMs. Our empirical analysis identifies specific network segments that intrinsically exhibit a low-rank structure. Furthermore, we extend our investigation to the implications of low-rank approximations on model bias. Overall, our findings offer a novel perspective on optimizing LLMs, presenting the low-rank approximation not only as a tool for performance enhancements, but also as a means to potentially rectify biases within these models. Our code is available at \href{https://github.com/nyunAI/SFSD-LLM}{GitHub}.

CLDec 29, 2025
A Dataset and Benchmark for Consumer Healthcare Question Summarization

Abhishek Basu, Deepak Gupta, Dina Demner-Fushman et al.

The quest for seeking health information has swamped the web with consumers health-related questions. Generally, consumers use overly descriptive and peripheral information to express their medical condition or other healthcare needs, contributing to the challenges of natural language understanding. One way to address this challenge is to summarize the questions and distill the key information of the original question. Recently, large-scale datasets have significantly propelled the development of several summarization tasks, such as multi-document summarization and dialogue summarization. However, a lack of a domain-expert annotated dataset for the consumer healthcare questions summarization task inhibits the development of an efficient summarization system. To address this issue, we introduce a new dataset, CHQ-Sum,m that contains 1507 domain-expert annotated consumer health questions and corresponding summaries. The dataset is derived from the community question answering forum and therefore provides a valuable resource for understanding consumer health-related posts on social media. We benchmark the dataset on multiple state-of-the-art summarization models to show the effectiveness of the dataset

CLFeb 4Code
BioACE: An Automated Framework for Biomedical Answer and Citation Evaluations

Deepak Gupta, Davis Bartels, Dina Demner-Fuhsman

With the increasing use of large language models (LLMs) for generating answers to biomedical questions, it is crucial to evaluate the quality of the generated answers and the references provided to support the facts in the generated answers. Evaluation of text generated by LLMs remains a challenge for question answering, retrieval-augmented generation (RAG), summarization, and many other natural language processing tasks in the biomedical domain, due to the requirements of expert assessment to verify consistency with the scientific literature and complex medical terminology. In this work, we propose BioACE, an automated framework for evaluating biomedical answers and citations against the facts stated in the answers. The proposed BioACE framework considers multiple aspects, including completeness, correctness, precision, and recall, in relation to the ground-truth nuggets for answer evaluation. We developed automated approaches to evaluate each of the aforementioned aspects and performed extensive experiments to assess and analyze their correlation with human evaluations. In addition, we considered multiple existing approaches, such as natural language inference (NLI) and pre-trained language models and LLMs, to evaluate the quality of evidence provided to support the generated answers in the form of citations into biomedical literature. With the detailed experiments and analysis, we provide the best approaches for biomedical answer and citation evaluation as a part of BioACE (https://github.com/deepaknlp/BioACE) evaluation package.

IRMar 23
Overview of TREC 2025 Biomedical Generative Retrieval (BioGen) Track

Deepak Gupta, Dina Demner-Fushman, William Hersh et al.

Recent advances in large language models (LLMs) have made significant progress across multiple biomedical tasks, including biomedical question answering, lay-language summarization of the biomedical literature, and clinical note summarization. These models have demonstrated strong capabilities in processing and synthesizing complex biomedical information and in generating fluent, human-like responses. Despite these advancements, hallucinations or confabulations remain key challenges when using LLMs in biomedical and other high-stakes domains. Inaccuracies may be particularly harmful in high-risk situations, such as medical question answering, making clinical decisions, or appraising biomedical research. Studies on the evaluation of the LLMs' abilities to ground generated statements in verifiable sources have shown that models perform significantly

LGDec 9, 2025
Wavelet-Accelerated Physics-Informed Quantum Neural Network for Multiscale Partial Differential Equations

Deepak Gupta, Himanshu Pandey, Ratikanta Behera

This work proposes a wavelet-based physics-informed quantum neural network framework to efficiently address multiscale partial differential equations that involve sharp gradients, stiffness, rapid local variations, and highly oscillatory behavior. Traditional physics-informed neural networks (PINNs) have demonstrated substantial potential in solving differential equations, and their quantum counterparts, quantum-PINNs, exhibit enhanced representational capacity with fewer trainable parameters. However, both approaches face notable challenges in accurately solving multiscale features. Furthermore, their reliance on automatic differentiation for constructing loss functions introduces considerable computational overhead, resulting in longer training times. To overcome these challenges, we developed a wavelet-accelerated physics-informed quantum neural network that eliminates the need for automatic differentiation, significantly reducing computational complexity. The proposed framework incorporates the multiresolution property of wavelets within the quantum neural network architecture, thereby enhancing the network's ability to effectively capture both local and global features of multiscale problems. Numerical experiments demonstrate that our proposed method achieves superior accuracy while requiring less than five percent of the trainable parameters compared to classical wavelet-based PINNs, resulting in faster convergence. Moreover, it offers a speedup of three to five times compared to existing quantum PINNs, highlighting the potential of the proposed approach for efficiently solving challenging multiscale and oscillatory problems.

CVJul 4, 2025
NOVO: Unlearning-Compliant Vision Transformers

Soumya Roy, Soumya Banerjee, Vinay Verma et al.

Machine unlearning (MUL) refers to the problem of making a pre-trained model selectively forget some training instances or class(es) while retaining performance on the remaining dataset. Existing MUL research involves fine-tuning using a forget and/or retain set, making it expensive and/or impractical, and often causing performance degradation in the unlearned model. We introduce {\pname}, an unlearning-aware vision transformer-based architecture that can directly perform unlearning for future unlearning requests without any fine-tuning over the requested set. The proposed model is trained by simulating unlearning during the training process itself. It involves randomly separating class(es)/sub-class(es) present in each mini-batch into two disjoint sets: a proxy forget-set and a retain-set, and the model is optimized so that it is unable to predict the forget-set. Forgetting is achieved by withdrawing keys, making unlearning on-the-fly and avoiding performance degradation. The model is trained jointly with learnable keys and original weights, ensuring withholding a key irreversibly erases information, validated by membership inference attack scores. Extensive experiments on various datasets, architectures, and resolutions confirm {\pname}'s superiority over both fine-tuning-free and fine-tuning-based methods.

CVDec 15, 2024
Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track

Deepak Gupta, Dina Demner-Fushman

One of the key goals of artificial intelligence (AI) is the development of a multimodal system that facilitates communication with the visual world (image and video) using a natural language query. Earlier works on medical question answering primarily focused on textual and visual (image) modalities, which may be inefficient in answering questions requiring demonstration. In recent years, significant progress has been achieved due to the introduction of large-scale language-vision datasets and the development of efficient deep neural techniques that bridge the gap between language and visual understanding. Improvements have been made in numerous vision-and-language tasks, such as visual captioning visual question answering, and natural language video localization. Most of the existing work on language vision focused on creating datasets and developing solutions for open-domain applications. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medical education questions. With increasing interest in AI to support clinical decision-making and improve patient engagement, there is a need to explore such challenges and develop efficient algorithms for medical language-video understanding and generation. Toward this, we introduced new tasks to foster research toward designing systems that can understand medical videos to provide visual answers to natural language questions, and are equipped with multimodal capability to generate instruction steps from the medical video. These tasks have the potential to support the development of sophisticated downstream applications that can benefit the public and medical professionals.

LGDec 12, 2023
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

Arnav Chavan, Nahush Lele, Deepak Gupta

Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges, particularly on consumer-grade hardware. This paper introduces an innovative approach for the parametric and practical compression of LLMs based on reduced order modelling, which entails low-rank decomposition within the feature space and re-parameterization in the weight space. Notably, this compression technique operates in a layer-wise manner, obviating the need for a GPU device and enabling the compression of billion-scale models within stringent constraints of both memory and time. Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.

CLMar 8, 2025
MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering

Vinay Kumar Verma, Shreyas Sunil Kulkarni, Happy Mittal et al.

Question Answering (QA) and Visual Question Answering (VQA) are well-studied problems in the language and vision domain. One challenging scenario involves multiple sources of information, each of a different modality, where the answer to the question may exist in one or more sources. This scenario contains richer information but is highly complex to handle. In this work, we formulate a novel question-answer generation (QAG) framework in an environment containing multi-source, multimodal information. The answer may belong to any or all sources; therefore, selecting the most prominent answer source or an optimal combination of all sources for a given question is challenging. To address this issue, we propose a question-guided attention mechanism that learns attention across multiple sources and decodes this information for robust and unbiased answer generation. To learn attention within each source, we introduce an explicit alignment between questions and various information sources, which facilitates identifying the most pertinent parts of the source information relative to the question. Scalability in handling diverse questions poses a challenge. We address this by extending our model to a sparse mixture-of-experts (sparse-MoE) framework, enabling it to handle thousands of question types. Experiments on T5 and Flan-T5 using three datasets demonstrate the model's efficacy, supported by ablation studies.

LGOct 22, 2024
Enhancing Deep Learning based RMT Data Inversion using Gaussian Random Field

Koustav Ghosal, Arun Singh, Samir Malakar et al.

Deep learning (DL) methods have emerged as a powerful tool for the inversion of geophysical data. When applied to field data, these models often struggle without additional fine-tuning of the network. This is because they are built on the assumption that the statistical patterns in the training and test datasets are the same. To address this, we propose a DL-based inversion scheme for Radio Magnetotelluric data where the subsurface resistivity models are generated using Gaussian Random Fields (GRF). The network's generalization ability was tested with an out-of-distribution (OOD) dataset comprising a homogeneous background and various rectangular-shaped anomalous bodies. After end-to-end training with the GRF dataset, the pre-trained network successfully identified anomalies in the OOD dataset. Synthetic experiments confirmed that the GRF dataset enhances generalization compared to a homogeneous background OOD dataset. The network accurately recovered structures in a checkerboard resistivity model, and demonstrated robustness to noise, outperforming traditional gradient-based methods. Finally, the developed scheme is tested using exemplary field data from a waste site near Roorkee, India. The proposed scheme enhances generalization in a data-driven supervised learning framework, suggesting a promising direction for OOD generalization in DL methods.

IRMar 19, 2024
Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models

Ying-Chun Lin, Jennifer Neville, Jack W. Stokes et al.

Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featurized ML models or text embeddings fall short in extracting generalizable patterns and are hard to interpret. In this work, we show that LLMs can extract interpretable signals of user satisfaction from their natural language utterances more effectively than embedding-based approaches. Moreover, an LLM can be tailored for USE via an iterative prompting framework using supervision from labeled examples. The resulting method, Supervised Prompting for User satisfaction Rubrics (SPUR), not only has higher accuracy but is more interpretable as it scores user satisfaction via learned rubrics with a detailed breakdown.

LGFeb 19, 2024
Beyond Uniform Scaling: Exploring Depth Heterogeneity in Neural Architectures

Akash Guna R. T, Arnav Chavan, Deepak Gupta

Conventional scaling of neural networks typically involves designing a base network and growing different dimensions like width, depth, etc. of the same by some predefined scaling factors. We introduce an automated scaling approach leveraging second-order loss landscape information. Our method is flexible towards skip connections a mainstay in modern vision transformers. Our training-aware method jointly scales and trains transformers without additional training iterations. Motivated by the hypothesis that not all neurons need uniform depth complexity, our approach embraces depth heterogeneity. Extensive evaluations on DeiT-S with ImageNet100 show a 2.5% accuracy gain and 10% parameter efficiency improvement over conventional scaling. Scaled networks demonstrate superior performance upon training small scale datasets from scratch. We introduce the first intact scaling mechanism for vision transformers, a step towards efficient model scaling.

CLMay 7, 2023
Empowering Language Model with Guided Knowledge Fusion for Biomedical Document Re-ranking

Deepak Gupta, Dina Demner-Fushman

Pre-trained language models (PLMs) have proven to be effective for document re-ranking task. However, they lack the ability to fully interpret the semantics of biomedical and health-care queries and often rely on simplistic patterns for retrieving documents. To address this challenge, we propose an approach that integrates knowledge and the PLMs to guide the model toward effectively capturing information from external sources and retrieving the correct documents. We performed comprehensive experiments on two biomedical and open-domain datasets that show that our approach significantly improves vanilla PLMs and other existing approaches for document re-ranking task.

CVJan 30, 2022
A Dataset for Medical Instructional Video Classification and Question Answering

Deepak Gupta, Kush Attal, Dina Demner-Fushman

This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aids, medical emergency, and medical education questions. Toward this, we created the MedVidCL and MedVidQA datasets and introduce the tasks of Medical Video Classification (MVC) and Medical Visual Answer Localization (MVAL), two tasks that focus on cross-modal (medical language and medical video) understanding. The proposed tasks and datasets have the potential to support the development of sophisticated downstream applications that can benefit the public and medical practitioners. Our datasets consist of 6,117 annotated videos for the MVC task and 3,010 annotated questions and answers timestamps from 899 videos for the MVAL task. These datasets have been verified and corrected by medical informatics experts. We have also benchmarked each task with the created MedVidCL and MedVidQA datasets and proposed the multimodal learning methods that set competitive baselines for future research.

CLSep 10, 2021
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation

Humair Raj Khan, Deepak Gupta, Asif Ekbal

Pre-trained language-vision models have shown remarkable performance on the visual question answering (VQA) task. However, most pre-trained models are trained by only considering monolingual learning, especially the resource-rich language like English. Training such models for multilingual setups demand high computing resources and multilingual language-vision dataset which hinders their application in practice. To alleviate these challenges, we propose a knowledge distillation approach to extend an English language-vision model (teacher) into an equally effective multilingual and code-mixed model (student). Unlike the existing knowledge distillation methods, which only use the output from the last layer of the teacher network for distillation, our student model learns and imitates the teacher from multiple intermediate layers (language and vision encoders) with appropriately designed distillation objectives for incremental knowledge extraction. We also create the large-scale multilingual and code-mixed VQA dataset in eleven different language setups considering the multiple Indian and European languages. Experimental results and in-depth analysis show the effectiveness of the proposed VQA model over the pre-trained language-vision models on eleven diverse language setups.

CLAug 16, 2021
BloomNet: A Robust Transformer based model for Bloom's Learning Outcome Classification

Abdul Waheed, Muskan Goyal, Nimisha Mittal et al.

Bloom taxonomy is a common paradigm for categorizing educational learning objectives into three learning levels: cognitive, affective, and psychomotor. For the optimization of educational programs, it is crucial to design course learning outcomes (CLOs) according to the different cognitive levels of Bloom Taxonomy. Usually, administrators of the institutions manually complete the tedious work of mapping CLOs and examination questions to Bloom taxonomy levels. To address this issue, we propose a transformer-based model named BloomNet that captures linguistic as well semantic information to classify the course learning outcomes (CLOs). We compare BloomNet with a diverse set of basic as well as strong baselines and we observe that our model performs better than all the experimented baselines. Further, we also test the generalization capability of BloomNet by evaluating it on different distributions which our model does not encounter during training and we observe that our model is less susceptible to distribution shift compared to the other considered models. We support our findings by performing extensive result analysis. In ablation study we observe that on explicitly encapsulating the linguistic information along with semantic information improves the model on IID (independent and identically distributed) performance as well as OOD (out-of-distribution) generalization capability.

CLJul 1, 2021
Reinforcement Learning for Abstractive Question Summarization with Question-aware Semantic Rewards

Shweta Yadav, Deepak Gupta, Asma Ben Abacha et al.

The growth of online consumer health questions has led to the necessity for reliable and accurate question answering systems. A recent study showed that manual summarization of consumer health questions brings significant improvement in retrieving relevant answers. However, the automatic summarization of long questions is a challenging task due to the lack of training data and the complexity of the related subtasks, such as the question focus and type recognition. In this paper, we introduce a reinforcement learning-based framework for abstractive question summarization. We propose two novel rewards obtained from the downstream tasks of (i) question-type identification and (ii) question-focus recognition to regularize the question generation model. These rewards ensure the generation of semantically valid questions and encourage the inclusion of key medical entities/foci in the question summary. We evaluated our proposed method on two benchmark datasets and achieved higher performance over state-of-the-art models. The manual evaluation of the summaries reveals that the generated questions are more diverse and have fewer factual inconsistencies than the baseline summaries

CLJun 1, 2021
Question-aware Transformer Models for Consumer Health Question Summarization

Shweta Yadav, Deepak Gupta, Asma Ben Abacha et al.

Searching for health information online is becoming customary for more and more consumers every day, which makes the need for efficient and reliable question answering systems more pressing. An important contributor to the success rates of these systems is their ability to fully understand the consumers' questions. However, these questions are frequently longer than needed and mention peripheral information that is not useful in finding relevant answers. Question summarization is one of the potential solutions to simplifying long and complex consumer questions before attempting to find an answer. In this paper, we study the task of abstractive summarization for real-world consumer health questions. We develop an abstractive question summarization model that leverages the semantic interpretation of a question via recognition of medical entities, which enables the generation of informative summaries. Towards this, we propose multiple Cloze tasks (i.e. the task of filing missing words in a given context) to identify the key medical entities that enforce the model to have better coverage in question-focus recognition. Additionally, we infuse the decoder inputs with question-type information to generate question-type driven summaries. When evaluated on the MeQSum benchmark corpus, our framework outperformed the state-of-the-art method by 10.2 ROUGE-L points. We also conducted a manual evaluation to assess the correctness of the generated summaries.

IVMar 8, 2021
CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection

Abdul Waheed, Muskan Goyal, Deepak Gupta et al.

Coronavirus (COVID-19) is a viral disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The spread of COVID-19 seems to have a detrimental effect on the global economy and health. A positive chest X-ray of infected patients is a crucial step in the battle against COVID-19. Early results suggest that abnormalities exist in chest X-rays of patients suggestive of COVID-19. This has led to the introduction of a variety of deep learning systems and studies have shown that the accuracy of COVID-19 patient detection through the use of chest X-rays is strongly optimistic. Deep learning networks like convolutional neural networks (CNNs) need a substantial amount of training data. Because the outbreak is recent, it is difficult to gather a significant number of radiographic images in such a short time. Therefore, in this research, we present a method to generate synthetic chest X-ray (CXR) images by developing an Auxiliary Classifier Generative Adversarial Network (ACGAN) based model called CovidGAN. In addition, we demonstrate that the synthetic images produced from CovidGAN can be utilized to enhance the performance of CNN for COVID-19 detection. Classification using CNN alone yielded 85% accuracy. By adding synthetic images produced by CovidGAN, the accuracy increased to 95%. We hope this method will speed up COVID-19 detection and lead to more robust systems of radiology.

CLMar 8, 2021
Domain Controlled Title Generation with Human Evaluation

Abdul Waheed, Muskan Goyal, Nimisha Mittal et al.

We study automatic title generation and present a method for generating domain-controlled titles for scientific articles. A good title allows you to get the attention that your research deserves. A title can be interpreted as a high-compression description of a document containing information on the implemented process. For domain-controlled titles, we used the pre-trained text-to-text transformer model and the additional token technique. Title tokens are sampled from a local distribution (which is a subset of global vocabulary) of the domain-specific vocabulary and not global vocabulary, thereby generating a catchy title and closely linking it to its corresponding abstract. Generated titles looked realistic, convincing, and very close to the ground truth. We have performed automated evaluation using ROUGE metric and human evaluation using five parameters to make a comparison between human and machine-generated titles. The titles produced were considered acceptable with higher metric ratings in contrast to the original titles. Thus we concluded that our research proposes a promising method for domain-controlled title generation.

CLJan 20, 2021
Can Taxonomy Help? Improving Semantic Question Matching using Question Taxonomy

Deepak Gupta, Rajkumar Pujari, Asif Ekbal et al.

In this paper, we propose a hybrid technique for semantic question matching. It uses our proposed two-layered taxonomy for English questions by augmenting state-of-the-art deep learning models with question classes obtained from a deep learning based question classifier. Experiments performed on three open-domain datasets demonstrate the effectiveness of our proposed approach. We achieve state-of-the-art results on partial ordering question ranking (POQR) benchmark dataset. Our empirical analysis shows that coupling standard distributional features (provided by the question encoder) with knowledge from taxonomy is more effective than either deep learning (DL) or taxonomy-based knowledge alone.

CVJan 14, 2021
Rescaling CNN through Learnable Repetition of Network Parameters

Arnav Chavan, Udbhav Bamba, Rishabh Tiwari et al.

Deeper and wider CNNs are known to provide improved performance for deep learning tasks. However, most such networks have poor performance gain per parameter increase. In this paper, we investigate whether the gain observed in deeper models is purely due to the addition of more optimization parameters or whether the physical size of the network as well plays a role. Further, we present a novel rescaling strategy for CNNs based on learnable repetition of its parameters. Based on this strategy, we rescale CNNs without changing their parameter count, and show that learnable sharing of weights itself can provide significant boost in the performance of any given model without changing its parameter count. We show that small base networks when rescaled, can provide performance comparable to deeper networks with as low as 6% of optimization parameters of the deeper one. The relevance of weight sharing is further highlighted through the example of group-equivariant CNNs. We show that the significant improvements obtained with group-equivariant CNNs over the regular CNNs on classification problems are only partly due to the added equivariance property, and part of it comes from the learnable repetition of network weights. For rot-MNIST dataset, we show that up to 40% of the relative gain reported by state-of-the-art methods for rotation equivariance could actually be due to just the learnt repetition of weights.

CVNov 23, 2020
Siamese Tracking with Lingual Object Constraints

Maximilian Filtenborg, Efstratios Gavves, Deepak Gupta

Classically, visual object tracking involves following a target object throughout a given video, and it provides us the motion trajectory of the object. However, for many practical applications, this output is often insufficient since additional semantic information is required to act on the video material. Example applications of this are surveillance and target-specific video summarization, where the target needs to be monitored with respect to certain predefined constraints, e.g., 'when standing near a yellow car'. This paper explores, tracking visual objects subjected to additional lingual constraints. Differently from Li et al., we impose additional lingual constraints upon tracking, which enables new applications of tracking. Whereas in their work the goal is to improve and extend upon tracking itself. To perform benchmarks and experiments, we contribute two datasets: c-MOT16 and c-LaSOT, curated through appending additional constraints to the frames of the original LaSOT and MOT16 datasets. We also experiment with two deep models SiamCT-DFG and SiamCT-CA, obtained through extending a recent state-of-the-art Siamese tracking method and adding modules inspired from the fields of natural language processing and visual question answering. Through experimental results, we show that the proposed model SiamCT-CA can significantly outperform its counterparts. Furthermore, our method enables the selective compression of videos, based on the validity of the constraint.

CLSep 27, 2020
Hierarchical Deep Multi-modal Network for Medical Visual Question Answering

Deepak Gupta, Swati Suman, Asif Ekbal

Visual Question Answering in Medical domain (VQA-Med) plays an important role in providing medical assistance to the end-users. These users are expected to raise either a straightforward question with a Yes/No answer or a challenging question that requires a detailed and descriptive answer. The existing techniques in VQA-Med fail to distinguish between the different question types sometimes complicates the simpler problems, or over-simplifies the complicated ones. It is certainly true that for different question types, several distinct systems can lead to confusion and discomfort for the end-users. To address this issue, we propose a hierarchical deep multi-modal network that analyzes and classifies end-user questions/queries and then incorporates a query-specific approach for answer prediction. We refer our proposed approach as Hierarchical Question Segregation based Visual Question Answering, in short HQS-VQA. Our contributions are three-fold, viz. firstly, we propose a question segregation (QS) technique for VQAMed; secondly, we integrate the QS model to the hierarchical deep multi-modal neural network to generate proper answers to the queries related to medical images; and thirdly, we study the impact of QS in Medical-VQA by comparing the performance of the proposed model with QS and a model without QS. We evaluate the performance of our proposed model on two benchmark datasets, viz. RAD and CLEF18. Experimental results show that our proposed HQS-VQA technique outperforms the baseline models with significant margins. We also conduct a detailed quantitative and qualitative analysis of the obtained results and discover potential causes of errors and their solutions.

AIAug 26, 2020
A Three-Stage Algorithm for the Large Scale Dynamic Vehicle Routing Problem with an Industry 4.0 Approach

Maryam Abdirad, Krishna Krishnan, Deepak Gupta

Companies are eager to have a smart supply chain especially when they have a dynamic system. Industry 4.0 is a concept which concentrates on mobility and real-time integration. Thus, it can be considered as a necessary component that has to be implemented for a Dynamic Vehicle Routing Problem. The aim of this research is to solve large-scale DVRP (LSDVRP) in which the delivery vehicles must serve customer demands from a common depot to minimize transit cost while not exceeding the capacity constraint of each vehicle. In LSDVRP, it is difficult to get an exact solution and the computational time complexity grows exponentially. To find near optimal answers for this problem, a hierarchical approach consisting of three stages callled cluster first, route construction second, route improvement third is proposed. The major contribution of this paper is dealing with large-size real-world problems to decrease the computational time complexity. The results confirmed that the proposed methodology is applicable.

OCAug 10, 2020
A Two-Stage Metaheuristic Algorithm for the Dynamic Vehicle Routing Problem in Industry 4.0 approach

Maryam Abdirad, Krishna Krishnan, Deepak Gupta

Industry 4.0 is a concept that assists companies in developing a modern supply chain (MSC) system when they are faced with a dynamic process. Because Industry 4.0 focuses on mobility and real-time integration, it is a good framework for a dynamic vehicle routing problem (DVRP). This research works on DVRP. The aim of this research is to minimize transportation cost without exceeding the capacity constraint of each vehicle while serving customer demands from a common depot. Meanwhile, new orders arrive at a specific time into the system while the vehicles are executing the delivery of existing orders. This paper presents a two-stage hybrid algorithm for solving the DVRP. In the first stage, construction algorithms are applied to develop the initial route. In the second stage, improvement algorithms are applied. Experimental results were designed for different sizes of problems. Analysis results show the effectiveness of the proposed algorithm.

CLApr 5, 2020
Reinforced Multi-task Approach for Multi-hop Question Generation

Deepak Gupta, Hardik Chauhan, Akella Ravi Tej et al.

Question generation (QG) attempts to solve the inverse of question answering (QA) problem by generating a natural language question given a document and an answer. While sequence to sequence neural models surpass rule-based systems for QG, they are limited in their capacity to focus on more than one supporting fact. For QG, we often require multiple supporting facts to generate high-quality questions. Inspired by recent works on multi-hop reasoning in QA, we take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context. We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator. In addition, we also proposed a question-aware reward function in a Reinforcement Learning (RL) framework to maximize the utilization of the supporting facts. We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA. Empirical evaluation shows our model to outperform the single-hop neural question generation models on both automatic evaluation metrics such as BLEU, METEOR, and ROUGE, and human evaluation metrics for quality and coverage of the generated questions.

IVMar 23, 2020
Diagnosis of Breast Cancer Based on Modern Mammography using Hybrid Transfer Learning

Aditya Khamparia, Subrato Bharati, Prajoy Podder et al.

Breast cancer is a common cancer for women. Early detection of breast cancer can considerably increase the survival rate of women. This paper mainly focuses on transfer learning process to detect breast cancer. Modified VGG (MVGG), residual network, mobile network is proposed and implemented in this paper. DDSM dataset is used in this paper. Experimental results show that our proposed hybrid transfers learning model (Fusion of MVGG16 and ImageNet) provides an accuracy of 88.3% where the number of epoch is 15. On the other hand, only modified VGG 16 architecture (MVGG 16) provides an accuracy 80.8% and MobileNet provides an accuracy of 77.2%. So, it is clearly stated that the proposed hybrid pre-trained network outperforms well compared to single architecture. This architecture can be considered as an effective tool for the radiologists in order to reduce the false negative and false positive rate. Therefore, the efficiency of mammography analysis will be improved.

MMDec 22, 2019
Hiding Data in Images Using Cryptography and Deep Neural Network

Kartik Sharma, Ashutosh Aggarwal, Tanay Singhania et al.

Steganography is an art of obscuring data inside another quotidian file of similar or varying types. Hiding data has always been of significant importance to digital forensics. Previously, steganography has been combined with cryptography and neural networks separately. Whereas, this research combines steganography, cryptography with the neural networks all together to hide an image inside another container image of the larger or same size. Although the cryptographic technique used is quite simple, but is effective when convoluted with deep neural nets. Other steganography techniques involve hiding data efficiently, but in a uniform pattern which makes it less secure. This method targets both the challenges and make data hiding secure and non-uniform.

CLSep 9, 2019
Improving Neural Question Generation using World Knowledge

Deepak Gupta, Kaheer Suleman, Mahmoud Adada et al.

In this paper, we propose a method for incorporating world knowledge (linked entities and fine-grained entity types) into a neural question generation model. This world knowledge helps to encode additional information related to the entities present in the passage required to generate human-like questions. We evaluate our models on both SQuAD and MS MARCO to demonstrate the usefulness of the world knowledge features. The proposed world knowledge enriched question generation model is able to outperform the vanilla neural question generation model by 1.37 and 1.59 absolute BLEU 4 score on SQuAD and MS MARCO test dataset respectively.

CLNov 1, 2018
Helping each Other: A Framework for Customer-to-Customer Suggestion Mining using a Semi-supervised Deep Neural Network

Hitesh Golchha, Deepak Gupta, Asif Ekbal et al.

Suggestion mining is increasingly becoming an important task along with sentiment analysis. In today's cyberspace world, people not only express their sentiments and dispositions towards some entities or services, but they also spend considerable time sharing their experiences and advice to fellow customers and the product/service providers with two-fold agenda: helping fellow customers who are likely to share a similar experience, and motivating the producer to bring specific changes in their offerings which would be more appreciated by the customers. In our current work, we propose a hybrid deep learning model to identify whether a review text contains any suggestion. The model employs semi-supervised learning to leverage the useful information from the large amount of unlabeled data. We evaluate the performance of our proposed model on a benchmark customer review dataset, comprising of the reviews of Hotel and Electronics domains. Our proposed approach shows the F-scores of 65.6% and 65.5% for the Hotel and Electronics review datasets, respectively. These performances are significantly better compared to the existing state-of-the-art system.

AIAug 5, 2018
Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering

Deepak Gupta, Sarah Kohail, Pushpak Bhattacharyya

Answer triggering is the task of selecting the best-suited answer for a given question from a set of candidate answers if exists. In this paper, we present a hybrid deep learning model for answer triggering, which combines several dependency graph based alignment features, namely graph edit distance, graph-based similarity and dependency graph coverage, with dense vector embeddings from a Convolutional Neural Network (CNN). Our experiments on the WikiQA dataset show that such a combination can more accurately trigger a candidate answer compared to the previous state-of-the-art models. Comparative study on WikiQA dataset shows 5.86% absolute F-score improvement at the question level.

CLOct 12, 2017
Auto Analysis of Customer Feedback using CNN and GRU Network

Deepak Gupta, Pabitra Lenka, Harsimran Bedi et al.

Analyzing customer feedback is the best way to channelize the data into new marketing strategies that benefit entrepreneurs as well as customers. Therefore an automated system which can analyze the customer behavior is in great demand. Users may write feedbacks in any language, and hence mining appropriate information often becomes intractable. Especially in a traditional feature-based supervised model, it is difficult to build a generic system as one has to understand the concerned language for finding the relevant features. In order to overcome this, we propose deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches that do not require handcrafting of features. We evaluate these techniques for analyzing customer feedback sentences in four languages, namely English, French, Japanese and Spanish. Our empirical analysis shows that our models perform well in all the four languages on the setups of IJCNLP Shared Task on Customer Feedback Analysis. Our model achieved the second rank in French, with an accuracy of 71.75% and third ranks for all the other languages.

CLFeb 1, 2017
SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text

Deepak Gupta, Shubham Tripathi, Asif Ekbal et al.

Use of social media has grown dramatically during the last few years. Users follow informal languages in communicating through social media. The language of communication is often mixed in nature, where people transcribe their regional language with English and this technique is found to be extremely popular. Natural language processing (NLP) aims to infer the information from these text where Part-of-Speech (PoS) tagging plays an important role in getting the prosody of the written text. For the task of PoS tagging on Code-Mixed Indian Social Media Text, we develop a supervised system based on Conditional Random Field classifier. In order to tackle the problem effectively, we have focused on extracting rich linguistic features. We participate in three different language pairs, ie. English-Hindi, English-Bengali and English-Telugu on three different social media platforms, Twitter, Facebook & WhatsApp. The proposed system is able to successfully assign coarse as well as fine-grained PoS tag labels for a given a code-mixed sentence. Experiments show that our system is quite generic that shows encouraging performance levels on all the three language pairs in all the domains.