Mehdi Bahrami

SE
5papers
749citations
Novelty32%
AI Score26

5 Papers

SEOct 16, 2021Code
AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Mehdi Bahrami, N. C. Shrikanth, Yuji Mizobuchi et al.

Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from searching snippet codes to function codes. In this paper, we introduce Augmented Code (AugmentedCode) retrieval which takes advantage of existing information within the code and constructs augmented programming language to improve the code retrieval models' performance. We curated a large corpus of Python and showcased the the framework and the results of augmented programming language which outperforms on CodeSearchNet and CodeBERT with a Mean Reciprocal Rank (MRR) of 0.73 and 0.96, respectively. The outperformed fine-tuned augmented code retrieval model is published in HuggingFace at https://huggingface.co/Fujitsu/AugCode and a demonstration video is available at: https://youtu.be/mnZrUTANjGs .

SEOct 4, 2021Code
PyTorrent: A Python Library Corpus for Large-scale Language Models

Mehdi Bahrami, N. C. Shrikanth, Shade Ruangwan et al.

A large scale collection of both semantic and natural language resources is essential to leverage active Software Engineering research areas such as code reuse and code comprehensibility. Existing machine learning models ingest data from Open Source repositories (like GitHub projects) and forum discussions (like Stackoverflow.com), whereas, in this showcase, we took a step backward to orchestrate a corpus titled PyTorrent that contains 218,814 Python package libraries from PyPI and Anaconda environment. This is because earlier studies have shown that much of the code is redundant and Python packages from these environments are better in quality and are well-documented. PyTorrent enables users (such as data scientists, students, etc.) to build off the shelf machine learning models directly without spending months of effort on large infrastructure. The dataset, schema and a pretrained language model is available at: https://github.com/fla-sil/PyTorrent

CLJun 3, 2021
A Systematic Investigation of KB-Text Embedding Alignment at Scale

Vardaan Pahuja, Yu Gu, Wenhu Chen et al.

Knowledge bases (KBs) and text often contain complementary knowledge: KBs store structured knowledge that can support long range reasoning, while text stores more comprehensive and timely knowledge in an unstructured way. Separately embedding the individual knowledge sources into vector spaces has demonstrated tremendous successes in encoding the respective knowledge, but how to jointly embed and reason with both knowledge sources to fully leverage the complementary information is still largely an open problem. We conduct a large-scale, systematic investigation of aligning KB and text embeddings for joint reasoning. We set up a novel evaluation framework with two evaluation tasks, few-shot link prediction and analogical reasoning, and evaluate an array of KB-text embedding alignment methods. We also demonstrate how such alignment can infuse textual information into KB embeddings for more accurate link prediction on emerging entities and events, using COVID-19 as a case study.

DCJul 17, 2013
Cloud Template, a Big Data Solution

Mehdi Bahrami

Today cloud computing has become as a new concept for hosting and delivering different services over the Internet for big data solutions. Cloud computing is attractive to different business owners of both small and enterprise as it eliminates the requirement for users to plan ahead for provisioning, and allows enterprises to start from the small and increase resources only when there is a rise in service demand. Despite the fact that cloud computing offers huge opportunities to the IT industry, the development of cloud computing technology is currently has several issues. This study presents an idea for introducing cloud templates which will be used for analyzing, designing, developing and implementing cloud computing systems. We will present a template based design for cloud computing systems, highlighting its key concepts, architectural principles and state of the art implementation, as well as research challenges and future work requirements. The aim of this idea is to provide a better understanding of the design challenges of cloud computing and identify important research directions in this big data increasingly important area. We will describe a series of studies by which we and other researchers have assessed the effectiveness of these techniques in practical situations. Finally, in this study we will show how this idea could be implemented in a practical and useful way in industry.

SEMay 20, 2012
An overview to Software Architecture in Intrusion Detection System

Mehdi Bahrami, Mohammad Bahrami

Today by growing network systems, security is a key feature of each network infrastructure. Network Intrusion Detection Systems (IDS) provide defense model for all security threats which are harmful to any network. The IDS could detect and block attack-related network traffic. The network control is a complex model. Implementation of an IDS could make delay in the network. Several software-based network intrusion detection systems are developed. However, the model has a problem with high speed traffic. This paper reviews of many type of software architecture in intrusion detection systems and describes the design and implementation of a high-performance network intrusion detection system that combines the use of software-based network intrusion detection sensors and a network processor board. The network processor which is a hardware-based model could acts as a customized load balancing splitter. This model cooperates with a set of modified content-based network intrusion detection sensors rather than IDS in processing network traffic and controls the high-speed.