Sukumar Nandi

CL
h-index36
10papers
53citations
Novelty23%
AI Score22

10 Papers

CLJul 7, 2022
AsNER -- Annotated Dataset and Baseline for Assamese Named Entity recognition

Dhrubajyoti Pathak, Sukumar Nandi, Priyankoo Sarmah

We present the AsNER, a named entity annotation dataset for low resource Assamese language with a baseline Assamese NER model. The dataset contains about 99k tokens comprised of text from the speech of the Prime Minister of India and Assamese play. It also contains person names, location names and addresses. The proposed NER dataset is likely to be a significant resource for deep neural based Assamese language processing. We benchmark the dataset by training NER models and evaluating using state-of-the-art architectures for supervised named entity recognition (NER) such as Fasttext, BERT, XLM-R, FLAIR, MuRIL etc. We implement several baseline approaches with state-of-the-art sequence tagging Bi-LSTM-CRF architecture. The highest F1-score among all baselines achieves an accuracy of 80.69% when using MuRIL as a word embedding method. The annotated dataset and the top performing model are made publicly available.

CLDec 14, 2022
AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach

Dhrubajyoti Pathak, Sukumar Nandi, Priyankoo Sarmah

Part of Speech (POS) tagging is crucial to Natural Language Processing (NLP). It is a well-studied topic in several resource-rich languages. However, the development of computational linguistic resources is still in its infancy despite the existence of numerous languages that are historically and literary rich. Assamese, an Indian scheduled language, spoken by more than 25 million people, falls under this category. In this paper, we present a Deep Learning (DL)-based POS tagger for Assamese. The development process is divided into two stages. In the first phase, several pre-trained word embeddings are employed to train several tagging models. This allows us to evaluate the performance of the word embeddings in the POS tagging task. The top-performing model from the first phase is employed to annotate another set of new sentences. In the second phase, the model is trained further using the fresh dataset. Finally, we attain a tagging accuracy of 86.52% in F1 score. The model may serve as a baseline for further study on DL-based Assamese POS tagging.

CLJan 6, 2024
Part-of-Speech Tagger for Bodo Language using Deep Learning approach

Dhrubajyoti Pathak, Sanjib Narzary, Sukumar Nandi et al.

Language Processing systems such as Part-of-speech tagging, Named entity recognition, Machine translation, Speech recognition, and Language modeling (LM) are well-studied in high-resource languages. Nevertheless, research on these systems for several low-resource languages, including Bodo, Mizo, Nagamese, and others, is either yet to commence or is in its nascent stages. Language model plays a vital role in the downstream tasks of modern NLP. Extensive studies are carried out on LMs for high-resource languages. Nevertheless, languages such as Bodo, Rabha, and Mising continue to lack coverage. In this study, we first present BodoBERT, a language model for the Bodo language. To the best of our knowledge, this work is the first such effort to develop a language model for Bodo. Secondly, we present an ensemble DL-based POS tagging model for Bodo. The POS tagging model is based on combinations of BiLSTM with CRF and stacked embedding of BodoBERT with BytePairEmbeddings. We cover several language models in the experiment to see how well they work in POS tagging tasks. The best-performing model achieves an F1 score of 0.8041. A comparative experiment was also conducted on Assamese POS taggers, considering that the language is spoken in the same region as Bodo.

CLMar 6, 2025
Comparative Study of Zero-Shot Cross-Lingual Transfer for Bodo POS and NER Tagging Using Gemini 2.0 Flash Thinking Experimental Model

Sanjib Narzary, Bihung Brahma, Haradip Mahilary et al.

Named Entity Recognition (NER) and Part-of-Speech (POS) tagging are critical tasks for Natural Language Processing (NLP), yet their availability for low-resource languages (LRLs) like Bodo remains limited. This article presents a comparative empirical study investigating the effectiveness of Google's Gemini 2.0 Flash Thinking Experiment model for zero-shot cross-lingual transfer of POS and NER tagging to Bodo. We explore two distinct methodologies: (1) direct translation of English sentences to Bodo followed by tag transfer, and (2) prompt-based tag transfer on parallel English-Bodo sentence pairs. Both methods leverage the machine translation and cross-lingual understanding capabilities of Gemini 2.0 Flash Thinking Experiment to project English POS and NER annotations onto Bodo text in CONLL-2003 format. Our findings reveal the capabilities and limitations of each approach, demonstrating that while both methods show promise for bootstrapping Bodo NLP, prompt-based transfer exhibits superior performance, particularly for NER. We provide a detailed analysis of the results, highlighting the impact of translation quality, grammatical divergences, and the inherent challenges of zero-shot cross-lingual transfer. The article concludes by discussing future research directions, emphasizing the need for hybrid approaches, few-shot fine-tuning, and the development of dedicated Bodo NLP resources to achieve high-accuracy POS and NER tagging for this low-resource language.

CVMar 3, 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language

Pankaj Choudhury, Yogesh Aggarwal, Prabhanjan Jadhav et al.

Most existing works in image caption synthesis use computation heavy deep neural networks and generates image descriptions in English language. This often restricts this important assistive tool for widespread use across language and accessibility barriers. This work presents AC-Lite, a computationally efficient model for image captioning in low-resource Assamese language. AC-Lite reduces computational requirements by replacing computation-heavy deep network components with lightweight alternatives. The AC-Lite model is designed through extensive ablation experiments with different image feature extractor networks and language decoders. A combination of ShuffleNetv2x1.5 with GRU based language decoder along with bilinear attention is found to provide the best performance with minimum compute. AC-Lite was observed to achieve an 82.3 CIDEr score on the COCO-AC dataset with 2.45 GFLOPs and 22.87M parameters.

CRDec 28, 2021
Blockchain Meets AI for Resilient and Intelligent Internet of Vehicles

Pranav Kumar Singh, Sukumar Nandi, Sunit K. Nandi et al.

The Internet of Vehicles (IoV) is flourishing and offers various applications relating to road safety, traffic and fuel efficiency, and infotainment. Dealing with security and privacy threats and managing the trust (detecting malicious and misbehaving peers) in IoV remains the most significant concern. Artificial Intelligence is one of the most revolutionizing technologies, and the predictive power of its machine learning models can help detect intrusions and misbehaviors. Similarly, empowering the state-of-the-art IoV security framework with blockchain can make it secure and resilient. This article discusses joint AI and blockchain for security, privacy and trust-related risks in IoV. This paper also presents problems, challenges, requirements and solutions using ML and blockchain to address aforementioned issues in IoV.

CRDec 18, 2020
Privacy Enhanced DigiLocker using Ciphertext-Policy Attribute-Based Encryption

Puneet Bakshi, Sukumar Nandi

Recently, Government of India has taken several initiatives to make India digitally strong such as to provide each resident a unique digital identity, referred to as Aadhaar, and to provide several online e-Governance services based on Aadhaar such as DigiLocker. DigiLocker is an online service which provides a shareable private storage space on public cloud to its subscribers. Although DigiLocker ensures traditional security such as data integrity and secure data access, privacy of e-documents are yet to addressed. Ciphertext-Policy Attribute-Based Encryption (CP-ABE) can improve data privacy but the right implementation of it has always been a challenge. This paper presents a scheme to implement privacy enhanced DigiLocker using CP-ABE.

DCJul 13, 2020
V-CARE: A Blockchain Based Framework for Secure Vehicle Health Record System

Pranav Kumar Singh, Roshan Singh, Sukumar Nandi

One of the biggest challenges associated with connected and autonomous vehicles (CAVs) is to maintain and make use of vehicles health records (VHR). VHR can facilitate different entities to offer various services in a proactive, transparent, secure, reliable and in an efficient manner. The state-of-the-art solutions for maintaining the VHR are centralized in nature, mainly owned by manufacturer and authorized in-vehicle device developers. Owners, drivers, and other key service providers have limited accessibility and control to the VHR. We need to change the strategy from single or limited party access to multi-party access to VHR in an secured manner so that all stakeholders of intelligent transportation system (ITS) can be benefited from this. Any unauthorized attempt to alter the data should also be prevented. Blockchain is one such potential candidate, which can facilitate the sharing of such data among different participating organizations and individuals. For example, owners, manufacturers, trusted third parties, road authorities, insurance companies, charging stations, and car selling ventures can access VHR stored on the blockchain in a permissioned, secured, and with a higher level of confidence. In this paper, a blockchain-based decentralized secure system for V-CARE is proposed to manage records in an interoperable framework that leads to improved ITS services in terms of safety, availability, reliability, efficiency, and maintenance. Insurance based on pay-how-you-drive (PHYD), and sale and purchase of used vehicles can also be made more transparent and reliable without compromising the confidentiality and security of sensitive data.

LGMay 4, 2020
PowerPlanningDL: Reliability-Aware Framework for On-Chip Power Grid Design using Deep Learning

Sukanta Dey, Sukumar Nandi, Gaurav Trivedi

With the increase in the complexity of chip designs, VLSI physical design has become a time-consuming task, which is an iterative design process. Power planning is that part of the floorplanning in VLSI physical design where power grid networks are designed in order to provide adequate power to all the underlying functional blocks. Power planning also requires multiple iterative steps to create the power grid network while satisfying the allowed worst-case IR drop and Electromigration (EM) margin. For the first time, this paper introduces Deep learning (DL)-based framework to approximately predict the initial design of the power grid network, considering different reliability constraints. The proposed framework reduces many iterative design steps and speeds up the total design cycle. Neural Network-based multi-target regression technique is used to create the DL model. Feature extraction is done, and the training dataset is generated from the floorplans of some of the power grid designs extracted from the IBM processor. The DL model is trained using the generated dataset. The proposed DL-based framework is validated using a new set of power grid specifications (obtained by perturbing the designs used in the training phase). The results show that the predicted power grid design is closer to the original design with minimal prediction error (~2%). The proposed DL-based approach also improves the design cycle time with a speedup of ~6X for standard power grid benchmarks.

NIJun 6, 2013
An Active Host-Based Intrusion Detection System for ARP-Related Attacks and its Verification

Ferdous A Barbhuiya, Santosh Biswas, Sukumar Nandi

Spoofing with falsified IP-MAC pair is the first step in most of the LAN based-attacks. Address Resolution Protocol (ARP) is stateless, which is the main cause that makes spoofing possible. Several network level and host level mechanisms have been proposed to detect and mitigate ARP spoofing but each of them has their own drawback. In this paper we propose a Host-based Intrusion Detection system for LAN attacks, which works without any extra constraint like static IP-MAC, modifying ARP etc. The proposed scheme is verified under all possible attack scenarios. The scheme is successfully validated in a test bed with various attack scenarios and the results show the effectiveness of the proposed technique.