Nusrat Jahan Prottasha

CL
h-index37
14papers
129citations
Novelty45%
AI Score43

14 Papers

LGNov 4, 2022
Impact Learning: A Learning Method from Features Impact and Competition

Nusrat Jahan Prottasha, Saydul Akbar Murad, Abu Jafar Md Muzahid et al.

Machine learning is the study of computer algorithms that can automatically improve based on data and experience. Machine learning algorithms build a model from sample data, called training data, to make predictions or judgments without being explicitly programmed to do so. A variety of wellknown machine learning algorithms have been developed for use in the field of computer science to analyze data. This paper introduced a new machine learning algorithm called impact learning. Impact learning is a supervised learning algorithm that can be consolidated in both classification and regression problems. It can furthermore manifest its superiority in analyzing competitive data. This algorithm is remarkable for learning from the competitive situation and the competition comes from the effects of autonomous features. It is prepared by the impacts of the highlights from the intrinsic rate of natural increase (RNI). We, moreover, manifest the prevalence of the impact learning over the conventional machine learning algorithm.

CLSep 17, 2024
Propulsion: Steering LLM with Tiny Fine-Tuning

Md Kowsher, Nusrat Jahan Prottasha, Prakash Bhat

The rapid advancements in Large Language Models (LLMs) have revolutionized natural language processing (NLP) and related fields. However, fine-tuning these models for specific tasks remains computationally expensive and risks degrading pre-learned features. To address these challenges, we propose Propulsion, a novel parameter efficient fine-tuning (PEFT) method designed to optimize task-specific performance while drastically reducing computational overhead. Inspired by the concept of controlled adjustments in physical motion, Propulsion selectively re-scales specific dimensions of a pre-trained model, guiding output predictions toward task objectives without modifying the model's parameters. By introducing lightweight, trainable Propulsion parameters at the pre-trained layer, we minimize the number of parameters updated during fine-tuning, preventing overfitting or overwriting of existing knowledge. Our theoretical analysis, supported by Neural Tangent Kernel (NTK) theory, shows that Propulsion approximates the performance of full fine-tuning with far fewer trainable parameters. Empirically, Propulsion reduces the parameter count from 355.3 million to just 0.086 million, achieving over a 10x reduction compared to standard approaches like LoRA while maintaining competitive performance across benchmarks.

LGJan 9
Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning

Nusrat Jahan Prottasha, Md Kowsher, Chun-Nam Yu et al.

Mixture-of-experts variants of parameter-efficient fine-tuning enable per-token specialization, but they introduce additional trainable routers and expert parameters, increasing memory usage and training cost. This undermines the core goal of parameter-efficient fine-tuning. We propose Monkey Jump, a method that brings mixture-of-experts-style specialization to parameter-efficient fine-tuning without introducing extra trainable parameters for experts or routers. Instead of adding new adapters as experts, Monkey Jump treats the adapters already present in each Transformer block (such as query, key, value, up, and down projections) as implicit experts and routes tokens among them. Routing is performed using k-means clustering with exponentially moving averaged cluster centers, requiring no gradients and no learned parameters. We theoretically show that token-wise routing increases expressivity and can outperform shared adapters by avoiding cancellation effects. Across multi-task experiments covering 14 text, 14 image, and 19 video benchmarks, Monkey Jump achieves competitive performance with mixture-of-experts-based parameter-efficient fine-tuning methods while using 7 to 29 times fewer trainable parameters, up to 48 percent lower memory consumption, and 1.5 to 2 times faster training. Monkey Jump is architecture-agnostic and can be applied to any adapter-based parameter-efficient fine-tuning method.

LGFeb 1
LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning

Md Kowsher, Haris Mansoor, Nusrat Jahan Prottasha et al.

MoE-PEFT methods combine Mixture of Experts with parameter-efficient fine-tuning for multi-task adaptation, but require separate adapters per expert causing trainable parameters to scale linearly with expert count and limiting applicability to adapter-based architectures. We propose LiME (Lightweight Mixture of Experts), which achieves expert specialization through lightweight modulation rather than adapter replication. Instead of separate adapters, LiME uses a single shared PEFT module and modulates its output with lightweight expert vectors, reducing expert parameters while generalizing to any PEFT method. Notably, LiME introduces zero-parameter routing by leveraging existing frozen and adapted representations eliminating learned router parameters typically required per layer. Theoretically, we prove that (i) more experts preserve more task-relevant information and (ii) modulation approximates full expert-specific PEFT with bounded error. LiME further incorporates n-gram windowed routing and adaptive expert selection (Auto Top-K) based on routing confidence. Experiments on MMT-47, a multimodal multi-task benchmark with 47 tasks spanning text, image, and video, demonstrate that LiME achieves competitive or superior performance while using up to 4x fewer trainable parameters and up to 29% faster training compared to corresponding MoE-PEFT baselines.

CLFeb 15, 2025Code
User Profile with Large Language Models: Construction, Updating, and Benchmarking

Nusrat Jahan Prottasha, Md Kowsher, Hafijur Raman et al.

User profile modeling plays a key role in personalized systems, as it requires building accurate profiles and updating them with new information. In this paper, we present two high-quality open-source user profile datasets: one for profile construction and another for profile updating. These datasets offer a strong basis for evaluating user profile modeling techniques in dynamic settings. We also show a methodology that uses large language models (LLMs) to tackle both profile construction and updating. Our method uses a probabilistic framework to predict user profiles from input text, allowing for precise and context-aware profile generation. Our experiments demonstrate that models like Mistral-7b and Llama2-7b perform strongly in both tasks. LLMs improve the precision and recall of the generated profiles, and high evaluation scores confirm the effectiveness of our approach.

CLOct 11, 2024
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning

Nusrat Jahan Prottasha, Asif Mahmud, Md. Shohanur Islam Sobuj et al.

Large Language Models (LLMs) are gaining significant popularity in recent years for specialized tasks using prompts due to their low computational cost. Standard methods like prefix tuning utilize special, modifiable tokens that lack semantic meaning and require extensive training for best performance, often falling short. In this context, we propose a novel method called Semantic Knowledge Tuning (SK-Tuning) for prompt and prefix tuning that employs meaningful words instead of random tokens. This method involves using a fixed LLM to understand and process the semantic content of the prompt through zero-shot capabilities. Following this, it integrates the processed prompt with the input text to improve the model's performance on particular tasks. Our experimental results show that SK-Tuning exhibits faster training times, fewer parameters, and superior performance on tasks such as text classification and understanding compared to other tuning methods. This approach offers a promising method for optimizing the efficiency and effectiveness of LLMs in processing language tasks.

CLApr 19, 2025
PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models

Nusrat Jahan Prottasha, Upama Roy Chowdhury, Shetu Mohanto et al.

Large models such as Large Language Models (LLMs) and Vision Language Models (VLMs) have transformed artificial intelligence, powering applications in natural language processing, computer vision, and multimodal learning. However, fully fine-tuning these models remains expensive, requiring extensive computational resources, memory, and task-specific data. Parameter-Efficient Fine-Tuning (PEFT) has emerged as a promising solution that allows adapting large models to downstream tasks by updating only a small portion of parameters. This survey presents a comprehensive overview of PEFT techniques, focusing on their motivations, design principles, and effectiveness. We begin by analyzing the resource and accessibility challenges posed by traditional fine-tuning and highlight key issues, such as overfitting, catastrophic forgetting, and parameter inefficiency. We then introduce a structured taxonomy of PEFT methods -- grouped into additive, selective, reparameterized, hybrid, and unified frameworks -- and systematically compare their mechanisms and trade-offs. Beyond taxonomy, we explore the impact of PEFT across diverse domains, including language, vision, and generative modeling, showing how these techniques offer strong performance with lower resource costs. We also discuss important open challenges in scalability, interpretability, and robustness, and suggest future directions such as federated learning, domain adaptation, and theoretical grounding. Our goal is to provide a unified understanding of PEFT and its growing role in enabling practical, efficient, and sustainable use of large models.

CLDec 21, 2023
L-TUNING: Synchronized Label Tuning for Prompt and Prefix in LLMs

Md. Kowsher, Md. Shohanur Islam Sobuj, Asif Mahmud et al.

Efficiently fine-tuning Large Language Models (LLMs) for specific tasks presents a considerable challenge in natural language processing. Traditional methods, like prompt or prefix tuning, typically rely on arbitrary tokens for training, leading to prolonged training times and generalized token use across various class labels. To address these issues, this paper introduces L-Tuning, an efficient fine-tuning approach designed for classification tasks within the Natural Language Inference (NLI) framework. Diverging from conventional methods, L-Tuning focuses on the fine-tuning of label tokens processed through a pre-trained LLM, thereby harnessing its pre-existing semantic knowledge. This technique not only improves the fine-tuning accuracy and efficiency but also facilitates the generation of distinct label embeddings for each class, enhancing the model's training nuance. Our experimental results indicate a significant improvement in training efficiency and classification accuracy with L-Tuning compared to traditional approaches, marking a promising advancement in fine-tuning LLMs for complex language tasks.

CLFeb 25, 2025
Predicting Through Generation: Why Generation Is Better for Prediction

Md Kowsher, Nusrat Jahan Prottasha, Prakash Bhat et al.

This paper argues that generating output tokens is more effective than using pooled representations for prediction tasks because token-level generation retains more mutual information. Since LLMs are trained on massive text corpora using next-token prediction, generation aligns naturally with their learned behavior. Using the Data Processing Inequality (DPI), we provide both theoretical and empirical evidence supporting this claim. However, autoregressive models face two key challenges when used for prediction: (1) exposure bias, where the model sees ground truth tokens during training but relies on its own predictions during inference, leading to errors, and (2) format mismatch, where discrete tokens do not always align with the tasks required output structure. To address these challenges, we introduce PredGen(Predicting Through Generating), an end to end framework that (i) uses scheduled sampling to reduce exposure bias, and (ii) introduces a task adapter to convert the generated tokens into structured outputs. Additionally, we introduce Writer-Director Alignment Loss (WDAL), which ensures consistency between token generation and final task predictions, improving both text coherence and numerical accuracy. We evaluate PredGen on multiple classification and regression benchmarks. Our results show that PredGen consistently outperforms standard baselines, demonstrating its effectiveness in structured prediction tasks.

CLNov 30, 2024
Does Self-Attention Need Separate Weights in Transformers?

Md Kowsher, Nusrat Jahan Prottasha, Chun-Nam Yu et al.

The success of self-attention lies in its ability to capture long-range dependencies and enhance context understanding, but it is limited by its computational complexity and challenges in handling sequential data with inherent directionality. This work introduces a shared weight self-attention-based BERT model that only learns one weight matrix for (Key, Value, and Query) representations instead of three individual matrices for each of them. Our shared weight attention reduces the training parameter size by more than half and training time by around one-tenth. Furthermore, we demonstrate higher prediction accuracy on small tasks of GLUE over the BERT baseline and in particular a generalization power on noisy and out-of-domain data. Experimental results indicate that our shared self-attention method achieves a parameter size reduction of 66.53% in the attention block. In the GLUE dataset, the shared weight self-attention-based BERT model demonstrates accuracy improvements of 0.38%, 5.81%, and 1.06% over the standard, symmetric, and pairwise attention-based BERT models, respectively. The model and source code are available at Anonymous.

LGOct 15, 2024
LLM-Mixer: Multiscale Mixing in LLMs for Time Series Forecasting

Md Kowsher, Md. Shohanur Islam Sobuj, Nusrat Jahan Prottasha et al.

Time series forecasting remains a challenging task, particularly in the context of complex multiscale temporal patterns. This study presents LLM-Mixer, a framework that improves forecasting accuracy through the combination of multiscale time-series decomposition with pre-trained LLMs (Large Language Models). LLM-Mixer captures both short-term fluctuations and long-term trends by decomposing the data into multiple temporal resolutions and processing them with a frozen LLM, guided by a textual prompt specifically designed for time-series data. Extensive experiments conducted on multivariate and univariate datasets demonstrate that LLM-Mixer achieves competitive performance, outperforming recent state-of-the-art models across various forecasting horizons. This work highlights the potential of combining multiscale analysis and LLMs for effective and scalable time-series forecasting.

CLApr 3, 2024
Token Trails: Navigating Contextual Depths in Conversational AI with ChatLLM

Md. Kowsher, Ritesh Panditi, Nusrat Jahan Prottasha et al.

Conversational modeling using Large Language Models (LLMs) requires a nuanced understanding of context to generate coherent and contextually relevant responses. In this paper, we present Token Trails, a novel approach that leverages token-type embeddings to navigate the intricate contextual nuances within conversations. Our framework utilizes token-type embeddings to distinguish between user utterances and bot responses, facilitating the generation of context-aware replies. Through comprehensive experimentation and evaluation, we demonstrate the effectiveness of Token Trails in improving conversational understanding and response generation, achieving state-of-the-art performance. Our results highlight the significance of contextual modeling in conversational AI and underscore the promising potential of Token Trails to advance the field, paving the way for more sophisticated and contextually aware chatbot interactions.

CLJun 1, 2025
FlowNIB: An Information Bottleneck Analysis of Bidirectional vs. Unidirectional Language Models

Md Kowsher, Nusrat Jahan Prottasha, Shiyun Xu et al.

Bidirectional language models have better context understanding and perform better than unidirectional models on natural language understanding tasks, yet the theoretical reasons behind this advantage remain unclear. In this work, we investigate this disparity through the lens of the Information Bottleneck (IB) principle, which formalizes a trade-off between compressing input information and preserving task-relevant content. We propose FlowNIB, a dynamic and scalable method for estimating mutual information during training that addresses key limitations of classical IB approaches, including computational intractability and fixed trade-off schedules. Theoretically, we show that bidirectional models retain more mutual information and exhibit higher effective dimensionality than unidirectional models. To support this, we present a generalized framework for measuring representational complexity and prove that bidirectional representations are strictly more informative under mild conditions. We further validate our findings through extensive experiments across multiple models and tasks using FlowNIB, revealing how information is encoded and compressed throughout training. Together, our work provides a principled explanation for the effectiveness of bidirectional architectures and introduces a practical tool for analyzing information flow in deep language models.

CLFeb 9, 2025
BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting

Mohammad Jahid Ibna Basher, Md Kowsher, Md Saiful Islam et al.

This paper introduces BnTTS (Bangla Text-To-Speech), the first framework for Bangla speaker adaptation-based TTS, designed to bridge the gap in Bangla speech synthesis using minimal training data. Building upon the XTTS architecture, our approach integrates Bangla into a multilingual TTS pipeline, with modifications to account for the phonetic and linguistic characteristics of the language. We pre-train BnTTS on 3.85k hours of Bangla speech dataset with corresponding text labels and evaluate performance in both zero-shot and few-shot settings on our proposed test dataset. Empirical evaluations in few-shot settings show that BnTTS significantly improves the naturalness, intelligibility, and speaker fidelity of synthesized Bangla speech. Compared to state-of-the-art Bangla TTS systems, BnTTS exhibits superior performance in Subjective Mean Opinion Score (SMOS), Naturalness, and Clarity metrics.