CLApr 26, 2023
ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization of Long and Short SummariesRaian Rahman, Rizvi Hasan, Abdullah Al Farhad et al.
Automatic chart to text summarization is an effective tool for the visually impaired people along with providing precise insights of tabular data in natural language to the user. A large and well-structured dataset is always a key part for data driven models. In this paper, we propose ChartSumm: a large-scale benchmark dataset consisting of a total of 84,363 charts along with their metadata and descriptions covering a wide range of topics and chart types to generate short and long summaries. Extensive experiments with strong baseline models show that even though these models generate fluent and informative summaries by achieving decent scores in various automatic evaluation metrics, they often face issues like suffering from hallucination, missing out important data points, in addition to incorrect explanation of complex trends in the charts. We also investigated the potential of expanding ChartSumm to other languages using automated translation tools. These make our dataset a challenging benchmark for future research.
CLAug 16, 2024
BnSentMix: A Diverse Bengali-English Code-Mixed Dataset for Sentiment AnalysisSadia Alam, Md Farhan Ishmam, Navid Hasin Alvee et al.
The widespread availability of code-mixed data can provide valuable insights into low-resource languages like Bengali, which have limited datasets. Sentiment analysis has been a fundamental text classification task across several languages for code-mixed data. However, there has yet to be a large-scale and diverse sentiment analysis dataset on code-mixed Bengali. We address this limitation by introducing BnSentMix, a sentiment analysis dataset on code-mixed Bengali consisting of 20,000 samples with 4 sentiment labels from Facebook, YouTube, and e-commerce sites. We ensure diversity in data sources to replicate realistic code-mixed scenarios. Additionally, we propose 14 baseline methods including novel transformer encoders further pre-trained on code-mixed Bengali-English, achieving an overall accuracy of 69.8% and an F1 score of 69.1% on sentiment classification tasks. Detailed analyses reveal variations in performance across different sentiment labels and text types, highlighting areas for future improvement.
CVJul 3, 2024
Visual Robustness Benchmark for Visual Question Answering (VQA)Md Farhan Ishmam, Ishmam Tashdeed, Talukder Asir Saadat et al.
Can Visual Question Answering (VQA) systems perform just as well when deployed in the real world? Or are they susceptible to realistic corruption effects e.g. image blur, which can be detrimental in sensitive applications, such as medical VQA? While linguistic or textual robustness has been thoroughly explored in the VQA literature, there has yet to be any significant work on the visual robustness of VQA models. We propose the first large-scale benchmark comprising 213,000 augmented images, challenging the visual robustness of multiple VQA models and assessing the strength of realistic visual corruptions. Additionally, we have designed several robustness evaluation metrics that can be aggregated into a unified metric and tailored to fit a variety of use cases. Our experiments reveal several insights into the relationships between model size, performance, and robustness with the visual corruptions. Our benchmark highlights the need for a balanced approach in model development that considers model performance without compromising the robustness.
LGJan 12, 2025
A Pan-cancer Classification Model using Multi-view Feature Selection Method and Ensemble ClassifierTareque Mohmud Chowdhury, Farzana Tabassum, Sabrina Islam et al.
Accurately identifying cancer samples is crucial for precise diagnosis and effective patient treatment. Traditional methods falter with high-dimensional and high feature-to-sample count ratios, which are critical for classifying cancer samples. This study aims to develop a novel feature selection framework specifically for transcriptome data and propose two ensemble classifiers. For feature selection, we partition the transcriptome dataset vertically based on feature types. Then apply the Boruta feature selection process on each of the partitions, combine the results, and apply Boruta again on the combined result. We repeat the process with different parameters of Boruta and prepare the final feature set. Finally, we constructed two ensemble ML models based on LR, SVM and XGBoost classifiers with max voting and averaging probability approach. We used 10-fold cross-validation to ensure robust and reliable classification performance. With 97.11\% accuracy and 0.9996 AUC value, our approach performs better compared to existing state-of-the-art methods to classify 33 types of cancers. A set of 12 types of cancer is traditionally challenging to differentiate between each other due to their similarity in tissue of origin. Our method accurately identifies over 90\% of samples from these 12 types of cancers, which outperforms all known methods presented in existing literature. The gene set enrichment analysis reveals that our framework's selected features have enriched the pathways highly related to cancers. This study develops a feature selection framework to select features highly related to cancer development and leads to identifying different types of cancer samples with higher accuracy.
LGJul 18, 2025
FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated LearningMd Rafid Haque, Abu Raihan Mostofa Kamal, Md. Azam Hossain
Federated Learning (FL) offers a paradigm for privacy-preserving collaborative AI, but its decentralized nature creates significant vulnerabilities to model poisoning attacks. While numerous static defenses exist, their effectiveness is highly context-dependent, often failing against adaptive adversaries or in heterogeneous data environments. This paper introduces FedStrategist, a novel meta-learning framework that reframes robust aggregation as a real-time, cost-aware control problem. We design a lightweight contextual bandit agent that dynamically selects the optimal aggregation rule from an arsenal of defenses based on real-time diagnostic metrics. Through comprehensive experiments, we demonstrate that no single static rule is universally optimal. We show that our adaptive agent successfully learns superior policies across diverse scenarios, including a ``Krum-favorable" environment and against a sophisticated "stealth" adversary designed to neutralize specific diagnostic signals. Critically, we analyze the paradoxical scenario where a non-robust baseline achieves high but compromised accuracy, and demonstrate that our agent learns a conservative policy to prioritize model integrity. Furthermore, we prove the agent's policy is controllable via a single "risk tolerance" parameter, allowing practitioners to explicitly manage the trade-off between performance and security. Our work provides a new, practical, and analyzable approach to creating resilient and intelligent decentralized AI systems.
IRMay 1, 2019
Disease Identification From Unstructured User InputFahim Faisal, Shafkat Ahmed Bhuiyan, Abu Raihan Mostofa Kamal
A method to identify probable diseases from the unstructured textual input (eg, health forum posts) by incorporating a lexicographic and semantic feature based two-phase text classification module and a symptom-disease correlation-based similarity measurement module. One notable aspect of my approach was to develop a competent algorithm to extract all inherent features from the data source to make a better decision.