Zahra Ghasemi

NE
h-index3
3papers
7citations
Novelty25%
AI Score16

3 Papers

LGDec 2, 2022
Fake detection in imbalance dataset by Semi-supervised learning with GAN

Jinus Bordbar, Saman Ardalan, Mohammadreza Mohammadrezaie et al.

As social media continues to grow rapidly, the prevalence of harassment on these platforms has also increased. This has piqued the interest of researchers in the field of fake detection. Social media data, often forms complex graphs with numerous nodes, posing several challenges. These challenges and limitations include dealing with a significant amount of irrelevant features in matrices and addressing issues such as high data dispersion and an imbalanced class distribution within the dataset. To overcome these challenges and limitations, researchers have employed auto-encoders and a combination of semi-supervised learning with a GAN algorithm, referred to as SGAN. Our proposed method utilizes auto-encoders for feature extraction and incorporates SGAN. By leveraging an unlabeled dataset, the unsupervised layer of SGAN compensates for the limited availability of labeled data, making efficient use of the limited number of labeled instances. Multiple evaluation metrics were employed, including the Confusion Matrix and the ROC curve. The dataset was divided into training and testing sets, with 100 labeled samples for training and 1,000 samples for testing. The novelty of our research lies in applying SGAN to address the issue of imbalanced datasets in fake account detection. By optimizing the use of a smaller number of labeled instances and reducing the need for extensive computational power, our method offers a more efficient solution. Additionally, our study contributes to the field by achieving an 81% accuracy in detecting fake accounts using only 100 labeled samples. This demonstrates the potential of SGAN as a powerful tool for handling minority classes and addressing big data challenges in fake account detection.

NEDec 18, 2023
Enhanced Genetic Programming Models with Multiple Equations for Accurate Semi-Autogenous Grinding Mill Throughput Prediction

Zahra Ghasemi, Mehdi Nesht, Chris Aldrich et al.

Semi-autogenous grinding (SAG) mills play a pivotal role in the grinding circuit of mineral processing plants. Accurate prediction of SAG mill throughput as a crucial performance metric is of utmost importance. The potential of applying genetic programming (GP) for this purpose has yet to be thoroughly investigated. This study introduces an enhanced GP approach entitled multi-equation GP (MEGP) for more accurate prediction of SAG mill throughput. In the new proposed method multiple equations, each accurately predicting mill throughput for specific clusters of training data are extracted. These equations are then employed to predict mill throughput for test data using various approaches. To assess the effect of distance measures, four different distance measures are employed in MEGP method. Comparative analysis reveals that the best MEGP approach achieves an average improvement of 10.74% in prediction accuracy compared with standard GP. In this approach, all extracted equations are utilized and both the number of data points in each data cluster and the distance to clusters are incorporated for calculating the final prediction. Further investigation of distance measures indicates that among four different metrics employed including Euclidean, Manhattan, Chebyshev, and Cosine distance, the Euclidean distance measure yields the most accurate results for the majority of data splits.

NEJan 26, 2022
Multi-objective Semi-supervised Clustering for Finding Predictive Clusters

Zahra Ghasemi, Hadi Akbarzadeh Khorshidi, Uwe Aickelin

This study concentrates on clustering problems and aims to find compact clusters that are informative regarding the outcome variable. The main goal is partitioning data points so that observations in each cluster are similar and the outcome variable can be predicated using these clusters simultaneously. We model this semi-supervised clustering problem as a multi-objective optimization problem with considering deviation of data points in clusters and prediction error of the outcome variable as two objective functions to be minimized. For finding optimal clustering solutions, we employ a non-dominated sorting genetic algorithm II approach and local regression is applied as prediction method for the output variable. For comparing the performance of the proposed model, we compute seven models using five real-world data sets. Furthermore, we investigate the impact of using local regression for predicting the outcome variable in all models, and examine the performance of the multi-objective models compared to single-objective models.