28.5LGApr 21
Federated Learning over Blockchain-Enabled Cloud InfrastructureSaloni Garg, Amit Sagtani, Kamal Kant Hiran
The rise of IoT devices and the uptake of cloud computing have informed a new era of data-driven intelligence. Traditional centralized machine learning models that require a large volume of data to be stored in a single location have therefore become more susceptible to data breaches, privacy violations, and regulatory non-compliance. This report presents a thorough examination of the merging of Federated Learning (FL) and blockchain technology in a cloud-edge setting, demonstrating it as an effective solution to the stated concerns. We are proposing a detailed four-dimensional architectural categorization that meticulously assesses coordination frameworks, consensus algorithms, data storage practices, and trust models that are significant to these integrated systems. The manuscript presents a comprehensive comparative examination of two cutting-edge frameworks: the Multi-Objectives Reinforcement Federated Learning Blockchain (MORFLB), which is designed for intelligent transportation systems, and the Federated Blockchain-IoT Framework for Sustainable Healthcare Systems (FBCI-SHS), elucidating their distinctive contributions and inherent limitations. Lastly, we engage in a thorough evaluation of the literature that integrates a comparative perspective on current frameworks to discern the singular nature of this research within existing knowledge systems. The manuscript culminates in delineating the principal challenges and offering a strategic framework for prospective research trajectories, emphasizing the advancement of adaptive, resilient, and standardized BCFL systems across diverse application domains.
3.4LGApr 17
ECG-Lens: Benchmarking ML & DL Models on PTB-XL DatasetSaloni Garg, Ukant Jadia, Amit Sagtani et al.
Automated classification of electrocardiogram (ECG) signals is a useful tool for diagnosing and monitoring cardiovascular diseases. This study compares three traditional machine learning algorithms (Decision Tree Classifier, Random Forest Classifier, and Logistic Regression) and three deep learning models (Simple Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Complex CNN (ECGLens)) for the classification of ECG signals from the PTB-XL dataset, which contains 12-lead recordings from normal patients and patients with various cardiac conditions. The DL models were trained on raw ECG signals, allowing them to automatically extract discriminative features. Data augmentation using the Stationary Wavelet Transform (SWT) was applied to enhance model performance, increase the diversity of training samples, and preserve the essential characteristics of the ECG signals. The models were evaluated using multiple metrics, including accuracy, precision, recall, F1-score, and ROC-AUC. The ECG-Lens model achieved the highest performance, with 80% classification accuracy and a 90% ROC-AUC. These findings demonstrate that deep learning architectures, particularly complex CNNs substantially outperform traditional ML methods on raw 12-lead ECG data, and provide a practical benchmark for selecting automated ECG classification models and identifying directions for condition-specific model development.
42.2LGMay 8
Unsolvability Ceiling in Multi-LLM Routing: An Empirical Study of Evaluation ArtifactsSaloni Garg, Amit Sagtani
Efficient routing across multiple LLMs enables cost-quality tradeoffs by directing queries to the cheapest capable model. Prior work attributes routing headroom to an "unsolvability ceiling", queries no model in the pool can solve. We present a large-scale study of multi-tier LLM routing with 206,000 query-model pairs across six benchmarks (MMLU, MedQA, HumanEval, MBPP, Alpaca, ShareGPT) using the Gemma 4 and Llama 3.1 families. Evaluating with both LLM-as-a-judge and exact-match metrics, we show that a substantial portion of reported unsolvability stems from evaluation artifacts: (i) systematic judge biases favoring verbosity over correctness, (ii) truncation under fixed generation budgets, and (iii) output format mismatches. Through dual-judge validation and exact-match grounding, we reduce measured unsolvability across tasks. We introduce a decomposition framework attributing failures to these artifacts, revealing consistent patterns across domains and model families. These artifacts also distort router training signals: standard routers collapse to majority-class prediction (~79% smallest-tier optimal), confirmed via random-feature and shuffled-label controls, incurring a 13-17 percentage point opportunity cost. We provide actionable recommendations including dual-judge validation, exact-match anchoring, and cost-sensitive objectives. Our findings suggest existing routing headroom estimates are substantially inflated, underscoring the need for reliable evaluation protocols in multi-LLM systems.