CLNov 24, 2025Code
LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on SylhetiTabia Tanzin Prama, Christopher M. Danforth, Peter Sheridan Dodds
Large Language Models (LLMs) have demonstrated strong translation abilities through prompting, even without task-specific training. However, their effectiveness in dialectal and low-resource contexts remains underexplored. This study presents the first systematic investigation of LLM-based machine translation (MT) for Sylheti, a dialect of Bangla that is itself low-resource. We evaluate five advanced LLMs (GPT-4.1, GPT-4.1, LLaMA 4, Grok 3, and DeepSeek V3.2) across both translation directions (Bangla $\Leftrightarrow$ Sylheti), and find that these models struggle with dialect-specific vocabulary. To address this, we introduce Sylheti-CAP (Context-Aware Prompting), a three-step framework that embeds a linguistic rulebook, a dictionary (2{,}260 core vocabulary items and idioms), and an authenticity check directly into prompts. Extensive experiments show that Sylheti-CAP consistently improves translation quality across models and prompting strategies. Both automatic metrics and human evaluations confirm its effectiveness, while qualitative analysis reveals notable reductions in hallucinations, ambiguities, and awkward phrasing, establishing Sylheti-CAP as a scalable solution for dialectal and low-resource MT. Dataset link: \href{https://github.com/TabiaTanzin/LLMs-for-Low-Resource-Dialect-Translation-Using-Context-Aware-Prompting-A-Case-Study-on-Sylheti.git}{https://github.com/TabiaTanzin/LLMs-for-Low-Resource-Dialect-Translation-Using-Context-Aware-Prompting-A-Case-Study-on-Sylheti.git}
CVFeb 23, 2025
Optimized Custom CNN for Real-Time Tomato Leaf Disease DetectionMangsura Kabir Oni, Tabia Tanzin Prama
In Bangladesh, tomatoes are a staple vegetable, prized for their versatility in various culinary applications. However, the cultivation of tomatoes is often hindered by a range of diseases that can significantly reduce crop yields and quality. Early detection of these diseases is crucial for implementing timely interventions and ensuring the sustainability of tomato production. Traditional manual inspection methods, while effective, are labor-intensive and prone to human error. To address these challenges, this research paper sought to develop an automated disease detection system using Convolutional Neural Networks (CNNs). A comprehensive dataset of tomato leaves was collected from the Brahmanbaria district, preprocessed to enhance image quality, and then applied to various deep learning models. Comparative performance analysis was conducted between YOLOv5, MobileNetV2, ResNet18, and our custom CNN model. In our study, the Custom CNN model achieved an impressive accuracy of 95.2%, significantly outperforming the other models, which achieved an accuracy of 77%, 89.38% and 71.88% respectively. While other models showed solid performance, our Custom CNN demonstrated superior results specifically tailored for the task of tomato leaf disease detection. These findings highlight the strong potential of deep learning techniques for improving early disease detection in tomato crops. By leveraging these advanced technologies, farmers can gain valuable insights to detect diseases at an early stage, allowing for more effective management practices. This approach not only promises to boost tomato yields but also contributes to the sustainability and resilience of the agricultural sector, helping to mitigate the impact of plant diseases on crop production.
CYNov 28, 2025
Misalignment of LLM-Generated Personas with Human Perceptions in Low-Resource SettingsTabia Tanzin Prama, Christopher M. Danforth, Peter Sheridan Dodds
Recent advances enable Large Language Models (LLMs) to generate AI personas, yet their lack of deep contextual, cultural, and emotional understanding poses a significant limitation. This study quantitatively compared human responses with those of eight LLM-generated social personas (e.g., Male, Female, Muslim, Political Supporter) within a low-resource environment like Bangladesh, using culturally specific questions. Results show human responses significantly outperform all LLMs in answering questions, and across all matrices of persona perception, with particularly large gaps in empathy and credibility. Furthermore, LLM-generated content exhibited a systematic bias along the lines of the ``Pollyanna Principle'', scoring measurably higher in positive sentiment ($Φ_{avg} = 5.99$ for LLMs vs. $5.60$ for Humans). These findings suggest that LLM personas do not accurately reflect the authentic experience of real people in resource-scarce environments. It is essential to validate LLM personas against real-world human data to ensure their alignment and reliability before deploying them in social science research.
CLOct 20, 2025
Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to SylhetiMangsura Kabir Oni, Tabia Tanzin Prama
Machine Translation (MT) has advanced from rule-based and statistical methods to neural approaches based on the Transformer architecture. While these methods have achieved impressive results for high-resource languages, low-resource varieties such as Sylheti remain underexplored. In this work, we investigate Bengali-to-Sylheti translation by fine-tuning multilingual Transformer models and comparing them with zero-shot large language models (LLMs). Experimental results demonstrate that fine-tuned models significantly outperform LLMs, with mBART-50 achieving the highest translation adequacy and MarianMT showing the strongest character-level fidelity. These findings highlight the importance of task-specific adaptation for underrepresented languages and contribute to ongoing efforts toward inclusive language technologies.
LGMar 4, 2025
AI Enabled User-Specific Cyberbullying Severity Detection with ExplainabilityTabia Tanzin Prama, Jannatul Ferdaws Amrin, Md. Mushfique Anwar et al.
The rise of social media has significantly increased the prevalence of cyberbullying (CB), posing serious risks to both mental and physical well-being. Effective detection systems are essential for mitigating its impact. While several machine learning (ML) models have been developed, few incorporate victims' psychological, demographic, and behavioral factors alongside bullying comments to assess severity. In this study, we propose an AI model intregrating user-specific attributes, including psychological factors (self-esteem, anxiety, depression), online behavior (internet usage, disciplinary history), and demographic attributes (race, gender, ethnicity), along with social media comments. Additionally, we introduce a re-labeling technique that categorizes social media comments into three severity levels: Not Bullying, Mild Bullying, and Severe Bullying, considering user-specific factors.Our LSTM model is trained using 146 features, incorporating emotional, topical, and word2vec representations of social media comments as well as user-level attributes and it outperforms existing baseline models, achieving the highest accuracy of 98\% and an F1-score of 0.97. To identify key factors influencing the severity of cyberbullying, we employ explainable AI techniques (SHAP and LIME) to interpret the model's decision-making process. Our findings reveal that, beyond hate comments, victims belonging to specific racial and gender groups are more frequently targeted and exhibit higher incidences of depression, disciplinary issues, and low self-esteem. Additionally, individuals with a prior history of bullying are at a greater risk of becoming victims of cyberbullying.