Md Muhtasim Munif Fahim

5papers

Novelty46%

AI Score45

Ranked #64,834 of 205,806 authors (top 32%)#14,597 in LG (top 34%)

5 Papers

LGJan 30

Green-NAS: A Global-Scale Multi-Objective Neural Architecture Search for Robust and Efficient Edge-Native Weather Forecasting

Md Muhtasim Munif Fahim, Soyda Humyra Yesmin, Saiful Islam et al.

We introduce Green-NAS, a multi-objective NAS (neural architecture search) framework designed for low-resource environments using weather forecasting as a case study. By adhering to 'Green AI' principles, the framework explicitly minimizes computational energy costs and carbon footprints, prioritizing sustainable deployment over raw computational scale. The Green-NAS architecture search method is optimized for both model accuracy and efficiency to find lightweight models with high accuracy and very few model parameters; this is accomplished through an optimization process that simultaneously optimizes multiple objectives. Our best-performing model, Green-NAS-A, achieved an RMSE of 0.0988 (i.e., within 1.4% of our manually tuned baseline) using only 153k model parameters, which is 239 times fewer than other globally applied weather forecasting models, such as GraphCast. In addition, we also describe how the use of transfer learning will improve the weather forecasting accuracy by approximately 5.2%, in comparison to a naive approach of training a new model for each city, when there is limited historical weather data available for that city.

0.3CLMar 17

A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews

Md. Naim Molla, Md Muhtasim Munif Fahim, Md. Binyamin et al.

For millions of users in developing economies who depend on mobile banking as their primary gateway to financial services, app quality directly shapes financial access. The study analyzed 5,652 Google Play reviews in English and Bangla (filtered from 11,414 raw reviews) for four Bangladeshi government banking apps. The authors used a hybrid labeling approach that combined use of the reviewer's star rating for each review along with a separate independent XLM-RoBERTa classifier to produce moderate inter-method agreement (kappa = 0.459). Traditional models outperformed transformer-based ones: Random Forest produced the highest accuracy (0.815), while Linear SVM produced the highest weighted F1 score (0.804); both were higher than the performance of fine-tuned XLM-RoBERTa (0.793). McNemar's test confirmed that all classical models were significantly superior to the off-the-shelf XLM-RoBERTa (p < 0.05), while differences with the fine-tuned variant were not statistically significant. DeBERTa-v3 was applied to analyze the sentiment at the aspect level across the reviews for the four apps; the reviewers expressed their dissatisfaction primarily with the speed of transactions and with the poor design of interfaces; eJanata app received the worst ratings from the reviewers across all apps. Three policy recommendations are made based on these findings - remediation of app quality, trust-centred release management, and Bangla-first NLP adoption - to assist state-owned banks in moving towards improving their digital services through data-driven methods. Notably, a 16.1-percentage-point accuracy gap between Bangla and English text highlights the need for low-resource language model development.

LGJan 28

The Depth Delusion: Why Transformers Should Be Wider, Not Deeper

Md Muhtasim Munif Fahim, Md Rezaul Karim

Neural scaling laws describe how language model loss decreases with parameters and data, but treat architecture as interchangeable--a billion parameters could arise from a shallow-wide model (10 layers & 8,192 hidden dimension) or a deep-narrow one (80 layers & 2,048 hidden dimension). We propose architecture-conditioned scaling laws decomposing this dependence, finding that optimal depth scales as D* ~ C^0.12 while optimal width scales as W* ~ C^0.34, meaning width should grow 2.8x faster than depth. We discover a critical depth phenomenon: beyond D_crit ~ W^0.44 (sublinear in W), adding layers increases loss despite adding parameters--the Depth Delusion. Empirically, we validate these findings across 30 transformer architectures spanning 17M to 7B parameters, each trained on representative high-compute samples, achieving R^2 = 0.922. Our central finding: at 7B scale, a 64-layer model (6.38B params) underperforms a 32-layer model (6.86B params) by 0.12 nats, despite being significantly deeper. This demonstrates that optimal depth-width tradeoffs persist at the production scale.

LGJan 28

Pre-trained Encoders for Global Child Development: Transfer Learning Enables Deployment in Data-Scarce Settings

Md Muhtasim Munif Fahim, Md Rezaul Karim

A large number of children experience preventable developmental delays each year, yet the deployment of machine learning in new countries has been stymied by a data bottleneck: reliable models require thousands of samples, while new programs begin with fewer than 100. We introduce the first pre-trained encoder for global child development, trained on 357,709 children across 44 countries using UNICEF survey data. With only 50 training samples, the pre-trained encoder achieves an average AUC of 0.65 (95% CI: 0.56-0.72), outperforming cold-start gradient boosting at 0.61 by 8-12% across regions. At N=500, the encoder achieves an AUC of 0.73. Zero-shot deployment to unseen countries achieves AUCs up to 0.84. We apply a transfer learning bound to explain why pre-training diversity enables few-shot generalization. These results establish that pre-trained encoders can transform the feasibility of ML for SDG 4.2.1 monitoring in resource-constrained settings.

LGFeb 3

Child Mortality Prediction in Bangladesh: A Decade-Long Validation Study

Md Muhtasim Munif Fahim, Md Rezaul Karim

The predictive machine learning models for child mortality tend to be inaccurate when applied to future populations, since they suffer from look-ahead bias due to the randomization used in cross-validation. The Demographic and Health Surveys (DHS) data from Bangladesh for 2011-2022, with n = 33,962, are used in this paper. We trained the model on (2011-2014) data, validated it on 2017 data, and tested it on 2022 data. Eight years after the initial test of the model, a genetic algorithm-based Neural Architecture Search found a single-layer neural architecture (with 64 units) to be superior to XGBoost (AUROC = 0.76 vs. 0.73; p < 0.01). Additionally, through a detailed fairness audit, we identified an overall "Socioeconomic Predictive Gradient," with a positive correlation between regional poverty level (r = -0.62) and the algorithm's AUC. In addition, we found that the model performed at its highest levels in the least affluent divisions (AUC 0.74) and decreased dramatically in the wealthiest divisions (AUC 0.66). These findings suggest that the model is identifying areas with the greatest need for intervention. Our model would identify approximately 1300 additional at-risk children annually than a Gradient Boosting model when screened at the 10% level and validated using SHAP values and Platt Calibration, and therefore provide a robust, production-ready computational phenotype for targeted maternal and child health interventions.