8.8DBApr 23
Implementation and Privacy Guarantees for Scalable Keyword Search on SOLID-based Decentralized Data with Granular Visibility ConstraintsMohamed Ragab, Faria Ferooz, Mohammad Bahrani et al.
In decentralized personal data ecosystems grounded in architectures such as Solid, users retain sovereignty over their data via personal online data stores (pods), hosted on Solid-compliant server infrastructures. In such environments, data remains under the control of pod owners, which complicates search due to distribution across numerous pods and user-specific access constraints. ESPRESSO is a decentralized framework for scalable keyword-based search across distributed Solid pods under user-defined visibility policies. It addresses key challenges of decentralized search by constructing WebID-scoped indexes within pods and employing privacy-aware metadata to enable efficient source selection and ranking across servers. This paper further introduces a formal threat model for ESPRESSO, analysing the security and privacy risks associated with the generation, aggregation, and use of indexes and metadata. These risks include unintended metadata leakage and the potential for adversaries to infer sensitive information about data that resides within personal data stores. The analysis identifies key design principles that limit metadata exposure while mitigating unauthorized inference. The proposed threat model provides a foundation for evaluating privacy-preserving decentralized search and informs the design of systems with stronger privacy guarantees.
CLDec 16, 2022
Utilizing distilBert transformer model for sentiment classification of COVID-19's Persian open-text responsesFatemeh Sadat Masoumi, Mohammad Bahrani
The COVID-19 pandemic has caused drastic alternations in human life in all aspects. The government's laws in this regard affected the lifestyle of all people. Due to this fact studying the sentiment of individuals is essential to be aware of the future impacts of the coming pandemics. To contribute to this aim, we proposed an NLP (Natural Language Processing) model to analyze open-text answers in a survey in Persian and detect positive and negative feelings of the people in Iran. In this study, a distilBert transformer model was applied to take on this task. We deployed three approaches to perform the comparison, and our best model could gain accuracy: 0.824, Precision: 0.824, Recall: 0.798, and F1 score: 0.804.
CLMar 27, 2023
ACO-tagger: A Novel Method for Part-of-Speech Tagging using Ant Colony OptimizationAmirhossein Mohammadi, Sara Hajiaghajani, Mohammad Bahrani
Swarm Intelligence algorithms have gained significant attention in recent years as a means of solving complex and non-deterministic problems. These algorithms are inspired by the collective behavior of natural creatures, and they simulate this behavior to develop intelligent agents for computational tasks. One such algorithm is Ant Colony Optimization (ACO), which is inspired by the foraging behavior of ants and their pheromone laying mechanism. ACO is used for solving difficult problems that are discrete and combinatorial in nature. Part-of-Speech (POS) tagging is a fundamental task in natural language processing that aims to assign a part-of-speech role to each word in a sentence. In this research paper, proposed a high-performance POS-tagging method based on ACO called ACO-tagger. This method achieved a high accuracy rate of 96.867%, outperforming several state-of-the-art methods. The proposed method is fast and efficient, making it a viable option for practical applications.
CLMay 16, 2022
Persian Abstract Meaning Representation: Annotation Guidelines and Gold Standard DatasetReza Takhshid, Tara Azin, Razieh Shojaei et al.
This paper introduces the Persian Abstract Meaning Representation (AMR) guidelines, a detailed guide for annotating Persian sentences with AMR, focusing on the necessary adaptations to fit Persian's unique syntactic structures. We discuss the development process of a Persian AMR gold standard dataset consisting of 1,562 sentences created following the guidelines. By examining the language specifications and nuances that distinguish AMR annotations of a low-resource language like Persian, we shed light on the challenges and limitations of developing a universal meaning representation framework. The guidelines and the dataset introduced in this study highlight such challenges, aiming to advance the field.
IRNov 1, 2018
Improving Information Retrieval Results for Persian Documents using FarsNetAdel Rahimi, Mohammad Bahrani
In this paper, we propose a new method for query expansion, which uses FarsNet (Persian WordNet) to find similar tokens related to the query and expand the semantic meaning of the query. For this purpose, we use synonymy relations in FarsNet and extract the related synonyms to query words. This algorithm is used to enhance information retrieval systems and improve search results. The overall evaluation of this system in comparison to the baseline method (without using query expansion) shows an improvement of about 9 percent in Mean Average Precision (MAP).
IROct 22, 2013
Improving the methods of email classification based on words ontologyForuzan Kiamarzpour, Rouhollah Dianat, Mohammad bahrani et al.
The Internet has dramatically changed the relationship among people and their relationships with others people and made the valuable information available for the users. Email is the service, which the Internet provides today for its own users; this service has attracted most of the users' attention due to the low cost. Along with the numerous benefits of Email, one of the weaknesses of this service is that the number of received emails is continually being enhanced, thus the ways are needed to automatically filter these disturbing letters. Most of these filters utilize a combination of several techniques such as the Black or white List, using the keywords and so on in order to identify the spam more accurately In this paper, we introduce a new method to classify the spam. We are seeking to increase the accuracy of Email classification by combining the output of several decision trees and the concept of ontology.