Md. Rakibul Islam

h-index13

4papers

54citations

Novelty34%

AI Score26

Ranked #161,145 of 194,257 authors (top 83%)#35,291 in LG (top 88%)

4 Papers

2.6LGJan 31, 2024

An Experiment on Feature Selection using Logistic Regression

Raisa Islam, Subhasish Mazumdar, Rakibul Islam

In supervised machine learning, feature selection plays a very important role by potentially enhancing explainability and performance as measured by computing time and accuracy-related metrics. In this paper, we investigate a method for feature selection based on the well-known L1 and L2 regularization strategies associated with logistic regression (LR). It is well known that the learned coefficients, which serve as weights, can be used to rank the features. Our approach is to synthesize the findings of L1 and L2 regularization. For our experiment, we chose the CIC-IDS2018 dataset owing partly to its size and also to the existence of two problematic classes that are hard to separate. We report first with the exclusion of one of them and then with its inclusion. We ranked features first with L1 and then with L2, and then compared logistic regression with L1 (LR+L1) against that with L2 (LR+L2) by varying the sizes of the feature sets for each of the two rankings. We found no significant difference in accuracy between the two methods once the feature set is selected. We chose a synthesis, i.e., only those features that were present in both the sets obtained from L1 and that from L2, and experimented with it on more complex models like Decision Tree and Random Forest and observed that the accuracy was very close in spite of the small size of the feature set. Additionally, we also report on the standard metrics: accuracy, precision, recall, and f1-score.

9.9SEAug 30, 2019

An Empirical Study of the Relationships between Code Readability and Software Complexity

Duaa Alawad, Manisha Panta, Minhaz Zibran et al.

Code readability and software complexity are important software quality metrics that impact other software metrics such as maintainability, reusability, portability and reliability. This paper presents an empirical study of the relationships between code readability and program complexity. The results are derived from an analysis of 35 Java programs that cover 23 distinct code constructs. The analysis includes six readability metrics and two complexity metrics. Our study empirically confirms the existing wisdom that readability and complexity are negatively correlated. Applying a machine learning technique, we also identify and rank those code constructs that substantially affect code readability.

7.1LGJan 23, 2019Code

Effectiveness of Tree-based Ensembles for Anomaly Discovery: Insights, Batch and Streaming Active Learning

Shubhomoy Das, Md Rakibul Islam, Nitthilan Kannappan Jayakodi et al.

In many real-world AD applications including computer security and fraud prevention, the anomaly detector must be configurable by the human analyst to minimize the effort on false positives. One important way to configure the detector is by providing true labels (nominal or anomaly) for a few instances. Recent work on active anomaly discovery has shown that greedily querying the top-scoring instance and tuning the weights of ensemble detectors based on label feedback allows us to quickly discover true anomalies. This paper makes four main contributions to improve the state-of-the-art in anomaly discovery using tree-based ensembles. First, we provide an important insight that explains the practical successes of unsupervised tree-based ensembles and active learning based on greedy query selection strategy. We also present empirical results on real-world data to support our insights and theoretical analysis to support active learning. Second, we develop a novel batch active learning algorithm to improve the diversity of discovered anomalies based on a formalism called compact description to describe the discovered anomalies. Third, we develop a novel active learning algorithm to handle streaming data setting. We present a data drift detection algorithm that not only detects the drift robustly, but also allows us to take corrective actions to adapt the anomaly detector in a principled manner. Fourth, we present extensive experiments to evaluate our insights and our tree-based active anomaly discovery algorithms in both batch and streaming data settings. Our results show that active learning allows us to discover significantly more anomalies than state-of-the-art unsupervised baselines, our batch active learning algorithm discovers diverse anomalies, and our algorithms under the streaming-data setup are competitive with the batch setup.

2.2LGOct 2, 2018Code

GLAD: GLocalized Anomaly Detection via Human-in-the-Loop Learning

Md Rakibul Islam, Shubhomoy Das, Janardhan Rao Doppa et al.

Human analysts that use anomaly detection systems in practice want to retain the use of simple and explainable global anomaly detectors. In this paper, we propose a novel human-in-the-loop learning algorithm called GLAD (GLocalized Anomaly Detection) that supports global anomaly detectors. GLAD automatically learns their local relevance to specific data instances using label feedback from human analysts. The key idea is to place a uniform prior on the relevance of each member of the anomaly detection ensemble over the input feature space via a neural network trained on unlabeled instances. Subsequently, weights of the neural network are tuned to adjust the local relevance of each ensemble member using all labeled instances. GLAD also provides explanations which can improve the understanding of end-users about anomalies. Our experiments on synthetic and real-world data show the effectiveness of GLAD in learning the local relevance of ensemble members and discovering anomalies via label feedback.