DLJul 4, 2023
A Bibliographic Study on Artificial Intelligence Research: Global Panorama and Indian AppearanceAmit Tiwari, Susmita Bardhan, Vikas Kumar
The present study identifies and assesses the bibliographic trend in Artificial Intelligence (AI) research for the years 2015-2020 using the science mapping method of bibliometric study. The required data has been collected from the Scopus database. To make the collected data analysis-ready, essential data transformation was performed manually and with the help of a tool viz. OpenRefine. For determining the trend and performing the mapping techniques, top five open access and commercial journals of AI have been chosen based on their citescore driven ranking. The work includes 6880 articles published in the specified period for analysis. The trend is based on Country-wise publications, year-wise publications, topical terms in AI, top-cited articles, prominent authors, major institutions, involvement of industries in AI and Indian appearance. The results show that compared to open access journals; commercial journals have a higher citescore and number of articles published over the years. Additionally, IEEE is the prominent publisher which publishes 84% of the top-cited publications. Further, China and the United States are the major contributors to literature in the AI domain. The study reveals that neural networks and deep learning are the major topics included in top AI research publications. Recently, not only public institutions but also private bodies are investing their resources in AI research. The study also investigates the relative position of Indian researchers in terms of AI research. Present work helps in understanding the initial development, current stand and future direction of AI.
IRJun 22, 2023
Data augmentation and refinement for recommender system: A semi-supervised approach using maximum margin matrix factorizationShamal Shaikh, Venkateswara Rao Kagita, Vikas Kumar et al.
Collaborative filtering (CF) has become a popular method for developing recommender systems (RSs) where ratings of a user for new items are predicted based on her past preferences and available preference information of other users. Despite the popularity of CF-based methods, their performance is often greatly limited by the sparsity of observed entries. In this study, we explore the data augmentation and refinement aspects of Maximum Margin Matrix Factorization (MMMF), a widely accepted CF technique for rating predictions, which has not been investigated before. We exploit the inherent characteristics of CF algorithms to assess the confidence level of individual ratings and propose a semi-supervised approach for rating augmentation based on self-training. We hypothesize that any CF algorithm's predictions with low confidence are due to some deficiency in the training data and hence, the performance of the algorithm can be improved by adopting a systematic data augmentation strategy. We iteratively use some of the ratings predicted with high confidence to augment the training data and remove low-confidence entries through a refinement process. By repeating this process, the system learns to improve prediction accuracy. Our method is experimentally evaluated on several state-of-the-art CF algorithms and leads to informative rating augmentation, improving the performance of the baseline approaches.
IRMar 26, 2022
Transfer of codebook latent factors for cross-domain recommendation with non-overlapping dataSowmini Devi Veeramachaneni, Arun K Pujari, Vineet Padmanabhan et al.
Recommender systems based on collaborative filtering play a vital role in many E-commerce applications as they guide the user in finding their items of interest based on the user's past transactions and feedback of other similar customers. Data Sparsity is one of the major drawbacks with collaborative filtering technique arising due to the less number of transactions and feedback data. In order to reduce the sparsity problem, techniques called transfer learning/cross-domain recommendation has emerged. In transfer learning methods, the data from other dense domain(s) (source) is considered in order to predict the missing ratings in the sparse domain (target). In this paper, we come up with a novel transfer learning approach for cross-domain recommendation, wherein the cluster-level rating pattern(codebook) of the source domain is obtained via a co-clustering technique. Thereafter we apply the Maximum Margin Matrix factorization (MMMF) technique on the codebook in order to learn the user and item latent features of codebook. Prediction of the target rating matrix is achieved by introducing these latent features in a novel way into the optimisation function. In the experiments we demonstrate that our model improves the prediction accuracy of the target matrix on benchmark datasets.
CVDec 2, 2024Code
Multimodal Fusion Learning with Dual Attention for Medical ImagingJoy Dhar, Nayyar Zaidi, Maryam Haghighat et al.
Multimodal fusion learning has shown significant promise in classifying various diseases such as skin cancer and brain tumors. However, existing methods face three key limitations. First, they often lack generalizability to other diagnosis tasks due to their focus on a particular disease. Second, they do not fully leverage multiple health records from diverse modalities to learn robust complementary information. And finally, they typically rely on a single attention mechanism, missing the benefits of multiple attention strategies within and across various modalities. To address these issues, this paper proposes a dual robust information fusion attention mechanism (DRIFA) that leverages two attention modules, i.e. multi-branch fusion attention module and the multimodal information fusion attention module. DRIFA can be integrated with any deep neural network, forming a multimodal fusion learning framework denoted as DRIFA-Net. We show that the multi-branch fusion attention of DRIFA learns enhanced representations for each modality, such as dermoscopy, pap smear, MRI, and CT-scan, whereas multimodal information fusion attention module learns more refined multimodal shared representations, improving the network's generalization across multiple tasks and enhancing overall performance. Additionally, to estimate the uncertainty of DRIFA-Net predictions, we have employed an ensemble Monte Carlo dropout strategy. Extensive experiments on five publicly available datasets with diverse modalities demonstrate that our approach consistently outperforms state-of-the-art methods. The code is available at https://github.com/misti1203/DRIFA-Net.
IRFeb 23
DReX: An Explainable Deep Learning-based Multimodal Recommendation FrameworkAdamya Shyam, Venkateswara Rao Kagita, Bharti Rana et al.
Multimodal recommender systems leverage diverse data sources, such as user interactions, content features, and contextual information, to address challenges like cold-start and data sparsity. However, existing methods often suffer from one or more key limitations: processing different modalities in isolation, requiring complete multimodal data for each interaction during training, or independent learning of user and item representations. These factors contribute to increased complexity and potential misalignment between user and item embeddings. To address these challenges, we propose DReX, a unified multimodal recommendation framework that incrementally refines user and item representations by leveraging interaction-level features from multimodal feedback. Our model employs gated recurrent units to selectively integrate these fine-grained features into global representations. This incremental update mechanism provides three key advantages: (1) simultaneous modeling of both nuanced interaction details and broader preference patterns, (2) eliminates the need for separate user and item feature extraction processes, leading to enhanced alignment in their learned representation, and (3) inherent robustness to varying or missing modalities. We evaluate the performance of the proposed approach on three real-world datasets containing reviews and ratings as interaction modalities. By considering review text as a modality, our approach automatically generates interpretable keyword profiles for both users and items, which supplement the recommendation process with interpretable preference indicators. Experiment results demonstrate that our approach outperforms state-of-the-art methods across all evaluated datasets.
AIJan 16
What Matters in Data Curation for Multimodal Reasoning? Insights from the DCVLR ChallengeYosub Shin, Michael Buriek, Boris Sobolev et al.
We study data curation for multimodal reasoning through the NeurIPS 2025 Data Curation for Vision-Language Reasoning (DCVLR) challenge, which isolates dataset selection by fixing the model and training protocol. Using a compact curated dataset derived primarily from Walton Multimodal Cold Start, our submission placed first in the challenge. Through post-competition ablations, we show that difficulty-based example selection on an aligned base dataset is the dominant driver of performance gains. Increasing dataset size does not reliably improve mean accuracy under the fixed training recipe, but mainly reduces run-to-run variance, while commonly used diversity and synthetic augmentation heuristics provide no additional benefit and often degrade performance. These results characterize DCVLR as a saturation-regime evaluation and highlight the central role of alignment and difficulty in data-efficient multimodal reasoning.
CLNov 15, 2025
AugAbEx : Way Forward for Extractive Case SummarizationPurnima Bindal, Vikas Kumar, Sagar Rathore et al.
Summarization of legal judgments poses a heavy cognitive burden on law practitioners due to the complexity of the language, context-sensitive legal jargon, and the length of the document. Therefore, the automatic summarization of legal documents has attracted serious attention from natural language processing researchers. Since the abstractive summaries of legal documents generated by deep neural methods remain prone to the risk of misrepresenting nuanced legal jargon or overlooking key contextual details, we envisage a rising trend toward the use of extractive case summarizers. Given the high cost of human annotation for gold standard extractive summaries, we engineer a light and transparent pipeline that leverages existing abstractive gold standard summaries to create the corresponding extractive gold standard versions. The approach ensures that the experts` opinions ensconced in the original gold standard abstractive summaries are carried over to the transformed extractive summaries. We aim to augment seven existing case summarization datasets, which include abstractive summaries, by incorporating corresponding extractive summaries and create an enriched data resource for case summarization research community. To ensure the quality of the augmented extractive summaries, we perform an extensive comparative evaluation with the original abstractive gold standard summaries covering structural, lexical, and semantic dimensions. We also compare the domain-level information of the two summaries. We commit to release the augmented datasets in the public domain for use by the research community and believe that the resource will offer opportunities to advance the field of automatic summarization of legal documents.
CVApr 19, 2021Code
OmniLayout: Room Layout Reconstruction from Indoor Spherical PanoramasShivansh Rao, Vikas Kumar, Daniel Kifer et al.
Given a single RGB panorama, the goal of 3D layout reconstruction is to estimate the room layout by predicting the corners, floor boundary, and ceiling boundary. A common approach has been to use standard convolutional networks to predict the corners and boundaries, followed by post-processing to generate the 3D layout. However, the space-varying distortions in panoramic images are not compatible with the translational equivariance property of standard convolutions, thus degrading performance. Instead, we propose to use spherical convolutions. The resulting network, which we call OmniLayout performs convolutions directly on the sphere surface, sampling according to inverse equirectangular projection and hence invariant to equirectangular distortions. Using a new evaluation metric, we show that our network reduces the error in the heavily distorted regions (near the poles) by approx 25 % when compared to standard convolutional networks. Experimental results show that OmniLayout outperforms the state-of-the-art by approx 4% on two different benchmark datasets (PanoContext and Stanford 2D-3D). Code is available at https://github.com/rshivansh/OmniLayout.
HCMar 16, 2024
Human Centered AI for Indian Legal Text AnalyticsSudipto Ghosh, Devanshu Verma, Balaji Ganesan et al.
Legal research is a crucial task in the practice of law. It requires intense human effort and intellectual prudence to research a legal case and prepare arguments. Recent boom in generative AI has not translated to proportionate rise in impactful legal applications, because of low trustworthiness and and the scarcity of specialized datasets for training Large Language Models (LLMs). This position paper explores the potential of LLMs within Legal Text Analytics (LTA), highlighting specific areas where the integration of human expertise can significantly enhance their performance to match that of experts. We introduce a novel dataset and describe a human centered, compound AI system that principally incorporates human inputs for performing LTA tasks with LLMs.
CLMar 3, 2024
Infusing Knowledge into Large Language Models with Contextual PromptsKinshuk Vasisht, Balaji Ganesan, Vikas Kumar et al.
Knowledge infusion is a promising method for enhancing Large Language Models for domain-specific NLP tasks rather than pre-training models over large data from scratch. These augmented LLMs typically depend on additional pre-training or knowledge prompts from an existing knowledge graph, which is impractical in many applications. In contrast, knowledge infusion directly from relevant documents is more generalisable and alleviates the need for structured knowledge graphs while also being useful for entities that are usually not found in any knowledge graph. With this motivation, we propose a simple yet generalisable approach for knowledge infusion by generating prompts from the context in the input text. Our experiments show the effectiveness of our approach which we evaluate by probing the fine-tuned LLMs.
CLDec 17, 2023
Deep dive into language traits of AI-generated AbstractsVikas Kumar, Amisha Bharti, Devanshu Verma et al.
Generative language models, such as ChatGPT, have garnered attention for their ability to generate human-like writing in various fields, including academic research. The rapid proliferation of generated texts has bolstered the need for automatic identification to uphold transparency and trust in the information. However, these generated texts closely resemble human writing and often have subtle differences in the grammatical structure, tones, and patterns, which makes systematic scrutinization challenging. In this work, we attempt to detect the Abstracts generated by ChatGPT, which are much shorter in length and bounded. We extract the texts semantic and lexical properties and observe that traditional machine learning models can confidently detect these Abstracts.
LGMar 24, 2025
Geometric Preference Elicitation for Minimax Regret Optimization in Uncertainty MatroidsAditya Sai Ellendula, Arun K Pujari, Vikas Kumar et al.
This paper presents an efficient preference elicitation framework for uncertain matroid optimization, where precise weight information is unavailable, but insights into possible weight values are accessible. The core innovation of our approach lies in its ability to systematically elicit user preferences, aligning the optimization process more closely with decision-makers' objectives. By incrementally querying preferences between pairs of elements, we iteratively refine the parametric uncertainty regions, leveraging the structural properties of matroids. Our method aims to achieve the exact optimum by reducing regret with a few elicitation rounds. Additionally, our approach avoids the computation of Minimax Regret and the use of Linear programming solvers at every iteration, unlike previous methods. Experimental results on four standard matroids demonstrate that our method reaches optimality more quickly and with fewer preference queries than existing techniques.
CLJun 16, 2024
Citation-Based Summarization of Landmark JudgmentsPurnima Bindal, Vikas Kumar, Vasudha Bhatnagar et al.
Landmark judgments are of prime importance in the Common Law System because of their exceptional jurisprudence and frequent references in other judgments. In this work, we leverage contextual references available in citing judgments to create an extractive summary of the target judgment. We evaluate the proposed algorithm on two datasets curated from the judgments of Indian Courts and find the results promising.
LGSep 18, 2021
Inductive Conformal Recommender SystemVenkateswara Rao Kagita, Arun K Pujari, Vineet Padmanabhan et al.
Traditional recommendation algorithms develop techniques that can help people to choose desirable items. However, in many real-world applications, along with a set of recommendations, it is also essential to quantify each recommendation's (un)certainty. The conformal recommender system uses the experience of a user to output a set of recommendations, each associated with a precise confidence value. Given a significance level $\varepsilon$, it provides a bound $\varepsilon$ on the probability of making a wrong recommendation. The conformal framework uses a key concept called \emph{nonconformity measure} that measures the strangeness of an item concerning other items. One of the significant design challenges of any conformal recommendation framework is integrating nonconformity measures with the recommendation algorithm. This paper introduces an inductive variant of a conformal recommender system. We propose and analyze different nonconformity measures in the inductive setting. We also provide theoretical proofs on the error-bound and the time complexity. Extensive empirical analysis on ten benchmark datasets demonstrates that the inductive variant substantially improves the performance in computation time while preserving the accuracy.
IRAug 2, 2021
A Hinge-Loss based Codebook Transfer for Cross-Domain Recommendation with Nonoverlapping DataSowmini Devi Veeramachaneni, Arun K Pujari, Vineet Padmanabhan et al.
Recommender systems(RS), especially collaborative filtering(CF) based RS, has been playing an important role in many e-commerce applications. As the information being searched over the internet is rapidly increasing, users often face the difficulty of finding items of his/her own interest and RS often provides help in such tasks. Recent studies show that, as the item space increases, and the number of items rated by the users become very less, issues like sparsity arise. To mitigate the sparsity problem, transfer learning techniques are being used wherein the data from dense domain(source) is considered in order to predict the missing entries in the sparse domain(target). In this paper, we propose a transfer learning approach for cross-domain recommendation when both domains have no overlap of users and items. In our approach the transferring of knowledge from source to target domain is done in a novel way. We make use of co-clustering technique to obtain the codebook (cluster-level rating pattern) of source domain. By making use of hinge loss function we transfer the learnt codebook of the source domain to target. The use of hinge loss as a loss function is novel and has not been tried before in transfer learning. We demonstrate that our technique improves the approximation of the target matrix on benchmark datasets.
LGApr 15, 2021
Machine Learning Approaches for Type 2 Diabetes Prediction and Care ManagementAloysius Lim, Ashish Singh, Jody Chiam et al.
Prediction of diabetes and its various complications has been studied in a number of settings, but a comprehensive overview of problem setting for diabetes prediction and care management has not been addressed in the literature. In this document we seek to remedy this omission in literature with an encompassing overview of diabetes complication prediction as well as situating this problem in the context of real world healthcare management. We illustrate various problems encountered in real world clinical scenarios via our own experience with building and deploying such models. In this manuscript we illustrate a Machine Learning (ML) framework for addressing the problem of predicting Type 2 Diabetes Mellitus (T2DM) together with a solution for risk stratification, intervention and management. These ML models align with how physicians think about disease management and mitigation, which comprises these four steps: Identify, Stratify, Engage, Measure.
LGFeb 7, 2021
Assessing Fairness in Classification Parity of Machine Learning Models in HealthcareMing Yuan, Vikas Kumar, Muhammad Aurangzeb Ahmad et al.
Fairness in AI and machine learning systems has become a fundamental problem in the accountability of AI systems. While the need for accountability of AI models is near ubiquitous, healthcare in particular is a challenging field where accountability of such systems takes upon additional importance, as decisions in healthcare can have life altering consequences. In this paper we present preliminary results on fairness in the context of classification parity in healthcare. We also present some exploratory methods to improve fairness and choosing appropriate classification algorithms in the context of healthcare.
LGFeb 6, 2021
Emergency Department Optimization and Load Prediction in HospitalsKarthik K. Padthe, Vikas Kumar, Carly M. Eckert et al.
Over the past several years, across the globe, there has been an increase in people seeking care in emergency departments (EDs). ED resources, including nurse staffing, are strained by such increases in patient volume. Accurate forecasting of incoming patient volume in emergency departments (ED) is crucial for efficient utilization and allocation of ED resources. Working with a suburban ED in the Pacific Northwest, we developed a tool powered by machine learning models, to forecast ED arrivals and ED patient volume to assist end-users, such as ED nurses, in resource allocation. In this paper, we discuss the results from our predictive models, the challenges, and the learnings from users' experiences with the tool in active clinical deployment in a real world setting.
CVAug 6, 2020
Noisy Student Training using Body Language Dataset Improves Facial Expression RecognitionVikas Kumar, Shivansh Rao, Li Yu
Facial expression recognition from videos in the wild is a challenging task due to the lack of abundant labelled training data. Large DNN (deep neural network) architectures and ensemble methods have resulted in better performance, but soon reach saturation at some point due to data inadequacy. In this paper, we use a self-training method that utilizes a combination of a labelled dataset and an unlabelled dataset (Body Language Dataset - BoLD). Experimental analysis shows that training a noisy student network iteratively helps in achieving significantly better results. Additionally, our model isolates different regions of the face and processes them independently using a multi-level attention mechanism which further boosts the performance. Our results show that the proposed method achieves state-of-the-art performance on benchmark datasets CK+ and AFEW 8.0 when compared to other single models.
IRJul 23, 2019
Collaborative Filtering and Multi-Label Classification with Matrix FactorizationVikas Kumar
Machine learning techniques for Recommendation System (RS) and Classification has become a prime focus of research to tackle the problem of information overload. RS are software tools that aim at making informed decisions about the services that a user may like. On the other hand, classification technique deals with the categorization of a data object into one of the several predefined classes. In the multi-label classification problem, unlike the traditional multi-class classification setting, each instance can be simultaneously associated with a subset of labels. The focus of thesis is on the development of novel techniques for collaborative filtering and multi-label classification. We propose a novel method of constructing a hierarchical bi-level maximum margin matrix factorization to handle matrix completion of ordinal rating matrix. Taking the cue from the alternative formulation of support vector machines, a novel loss function is derived by considering proximity as an alternative criterion instead of margin maximization criterion for matrix factorization framework. We extended the concept of matrix factorization for yet another important problem of machine learning namely multi-label classification which deals with the classification of data with multiple labels. We propose a novel piecewise-linear embedding method with a low-rank constraint on parametrization to capture nonlinear intrinsic relationships that exist in the original feature and label space. We also study the embedding of labels together with the group information with an objective to build an efficient multi-label classifier. We assume the existence of a low-dimensional space onto which the feature vectors and label vectors can be embedded. We ensure that labels belonging to the same group share the same sparsity pattern in their low-rank representations.
LGJul 17, 2019
Block based Singular Value Decomposition approach to matrix factorization for recommender systemsPrasad Bhavana, Vikas Kumar, Vineet Padmanabhan
With the abundance of data in recent years, interesting challenges are posed in the area of recommender systems. Producing high quality recommendations with scalability and performance is the need of the hour. Singular Value Decomposition(SVD) based recommendation algorithms have been leveraged to produce better results. In this paper, we extend the SVD technique further for scalability and performance in the context of 1) multi-threading 2) multiple computational units (with the use of Graphical Processing Units) and 3) distributed computation. We propose block based matrix factorization (BMF) paired with SVD. This enabled us to take advantage of SVD over basic matrix factorization(MF) while taking advantage of parallelism and scalability through BMF. We used Compute Unified Device Architecture (CUDA) platform and related hardware for leveraging Graphical Processing Unit (GPU) along with block based SVD to demonstrate the advantages in terms of performance and memory.
GTJan 29, 2019
Committee Selection with Attribute Level PreferencesVenkateswara Rao Kagita, Arun K Pujari, Vineet Padmanabhan et al.
We consider the problem of committee selection from a fixed set of candidates where each candidate has multiple quantifiable attributes. To select the best possible committee, instead of voting for a candidate, a voter is allowed to approve the preferred attributes of a given candidate. Though attribute based preference is addressed in several contexts, committee selection problem with attribute approval of voters has not been attempted earlier. A committee formed on attribute preferences is more likely to be a better representative of the qualities desired by the voters and is less likely to be susceptible to collusion or manipulation. In this work, we provide a formal study of the different aspects of this problem and define properties of weak unanimity, strong unanimity, simple justified representations and compound justified representation, that are required to be satisfied by the selected committee. We show that none of the existing vote/approval aggregation rules satisfy these new properties for attribute aggregation. We describe a greedy approach for attribute aggregation that satisfies the first three properties, but not the fourth, i.e., compound justified representation, which we prove to be NP-complete. Furthermore, we prove that finding a committee with justified representation and the highest approval voting score is NP-complete.
LGDec 24, 2018
Group Preserving Label Embedding for Multi-Label ClassificationVikas Kumar, Arun K Pujari, Vineet Padmanabhan et al.
Multi-label learning is concerned with the classification of data with multiple class labels. This is in contrast to the traditional classification problem where every data instance has a single label. Due to the exponential size of output space, exploiting intrinsic information in feature and label spaces has been the major thrust of research in recent years and use of parametrization and embedding have been the prime focus. Researchers have studied several aspects of embedding which include label embedding, input embedding, dimensionality reduction and feature selection. These approaches differ from one another in their capability to capture other intrinsic properties such as label correlation, local invariance etc. We assume here that the input data form groups and as a result, the label matrix exhibits a sparsity pattern and hence the labels corresponding to objects in the same group have similar sparsity. In this paper, we study the embedding of labels together with the group information with an objective to build an efficient multi-label classification. We assume the existence of a low-dimensional space onto which the feature vectors and label vectors can be embedded. In order to achieve this, we address three sub-problems namely; (1) Identification of groups of labels; (2) Embedding of label vectors to a low rank-space so that the sparsity characteristic of individual groups remains invariant; and (3) Determining a linear mapping that embeds the feature vectors onto the same set of points, as in stage 2, in the low-dimensional space. We compare our method with seven well-known algorithms on twelve benchmark data sets. Our experimental analysis manifests the superiority of our proposed method over state-of-art algorithms for multi-label learning.