Andriy Miranskyy

SE
h-index9
30papers
477citations
Novelty26%
AI Score43

30 Papers

SEJul 19, 2019Code
On Usefulness of the Deep-Learning-Based Bug Localization Models to Practitioners

Sravya Polisetty, Andriy Miranskyy, Ayse Bener

Background: Developers spend a significant amount of time and efforts to localize bugs. In the literature, many researchers proposed state-of-the-art bug localization models to help developers localize bugs easily. The practitioners, on the other hand, expect a bug localization tool to meet certain criteria, such as trustworthiness, scalability, and efficiency. The current models are not capable of meeting these criteria, making it harder to adopt these models in practice. Recently, deep-learning-based bug localization models have been proposed in the literature, which show a better performance than the state-of-the-art models. Aim: In this research, we would like to investigate whether deep learning models meet the expectations of practitioners or not. Method: We constructed a Convolution Neural Network and a Simple Logistic model to examine their effectiveness in localizing bugs. We train these models on five open source projects written in Java and compare their performance with the performance of other state-of-the-art models trained on these datasets. Results: Our experiments show that although the deep learning models perform better than classic machine learning models, they meet the adoption criteria set by the practitioners only partially. Conclusions: This work provides evidence that the practitioners should be cautious while using the current state of the art models for production-level use-cases. It also highlights the need for standardization of performance benchmarks to ensure that bug localization models are assessed equitably and realistically.

DBMar 8, 2019Code
Automated data validation: an industrial experience report

Lei Zhang, Sean Howard, Tom Montpool et al.

There has been a massive explosion of data generated by customers and retained by companies in the last decade. However, there is a significant mismatch between the increasing volume of data and the lack of automation methods and tools. The lack of best practices in data science programming may lead to software quality degradation, release schedule slippage, and budget overruns. To mitigate these concerns, we would like to bring software engineering best practices into data science. Specifically, we focus on automated data validation in the data preparation phase of the software development life cycle. This paper studies a real-world industrial case and applies software engineering best practices to develop an automated test harness called RESTORE. We release RESTORE as an open-source R package. Our experience report, done on the geodemographic data, shows that RESTORE enables efficient and effective detection of errors injected during the data preparation phase. RESTORE also significantly reduced the cost of testing. We hope that the community benefits from the open-source project and the practical advice based on our experience.

SEApr 27
A Course on the Introduction to Quantum Software Engineering: Experience Report

Andriy Miranskyy

Quantum computing is increasingly practiced through programming, yet most educational offerings emphasize algorithmic or framework-level use rather than software engineering concerns such as testing, abstraction, tooling, and lifecycle management. This paper reports on the design and first offering of a cross-listed undergraduate--graduate course that frames quantum computing through a software engineering lens, focusing on early-stage competence relevant to software engineering practice. The course integrates foundational quantum concepts with software engineering perspectives, emphasizing executable artifacts, empirical reasoning, and trade-offs arising from probabilistic behaviour, noise, and evolving toolchains. Evidence is drawn from instructor observations, supplemented by anonymous student feedback, a background survey, and inspection of student work. Despite minimal prior exposure to quantum computing, students were able to engage productively with quantum software engineering topics once a foundational understanding of quantum information and quantum algorithms, expressed through executable artifacts, was established. This experience report contributes a modular course design, a scalable assessment model for mixed academic levels, and transferable lessons for software engineering educators developing quantum computing curricula.

LGAug 5, 2024
On Using Quasirandom Sequences in Machine Learning for Model Weight Initialization

Andriy Miranskyy, Adam Sorrenti, Viral Thakar

The effectiveness of training neural networks directly impacts computational costs, resource allocation, and model development timelines in machine learning applications. An optimizer's ability to train the model adequately (in terms of trained model performance) depends on the model's initial weights. Model weight initialization schemes use pseudorandom number generators (PRNGs) as a source of randomness. We investigate whether substituting PRNGs for low-discrepancy quasirandom number generators (QRNGs) -- namely Sobol' sequences -- as a source of randomness for initializers can improve model performance. We examine Multi-Layer Perceptrons (MLP), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Transformer architectures trained on MNIST, CIFAR-10, and IMDB datasets using SGD and Adam optimizers. Our analysis uses ten initialization schemes: Glorot, He, Lecun (both Uniform and Normal); Orthogonal, Random Normal, Truncated Normal, and Random Uniform. Models with weights set using PRNG- and QRNG-based initializers are compared pairwise for each combination of dataset, architecture, optimizer, and initialization scheme. Our findings indicate that QRNG-based neural network initializers either reach a higher accuracy or achieve the same accuracy more quickly than PRNG-based initializers in 60% of the 120 experiments conducted. Thus, using QRNG-based initializers instead of PRNG-based initializers can speed up and improve model training.

CLDec 7, 2023
On Sarcasm Detection with OpenAI GPT-based Models

Montgomery Gole, Williams-Paul Nwadiugwu, Andriy Miranskyy

Sarcasm is a form of irony that requires readers or listeners to interpret its intended meaning by considering context and social cues. Machine learning classification models have long had difficulty detecting sarcasm due to its social complexity and contradictory nature. This paper explores the applications of the Generative Pretrained Transformer (GPT) models, including GPT-3, InstructGPT, GPT-3.5, and GPT-4, in detecting sarcasm in natural language. It tests fine-tuned and zero-shot models of different sizes and releases. The GPT models were tested on the political and balanced (pol-bal) portion of the popular Self-Annotated Reddit Corpus (SARC 2.0) sarcasm dataset. In the fine-tuning case, the largest fine-tuned GPT-3 model achieves accuracy and $F_1$-score of 0.81, outperforming prior models. In the zero-shot case, one of GPT-4 models yields an accuracy of 0.70 and $F_1$-score of 0.75. Other models score lower. Additionally, a model's performance may improve or deteriorate with each release, highlighting the need to reassess performance after each release.

QUANT-PHApr 27
Improving Zero-Noise Extrapolation via Physically Bounded Models

Andriy Miranskyy, Adam Sorrenti, Jasmine Thind et al.

Zero-noise extrapolation (ZNE) mitigates errors in near-term quantum devices by extrapolating measurements obtained at amplified noise levels to estimate noise-free expectation values. In practice, commonly used extrapolation models are fitted without enforcing physical constraints, which can yield predictions outside the valid range of quantum observables. In this work, we introduce physically bounded variants of polynomial, exponential, and polynomial--exponential extrapolation models by explicitly parameterizing the zero-noise estimate and constraining it during optimization. We evaluate the approach using a large synthetic benchmark comprising 180,000 circuits and approximately 3.6 million ZNE experiments generated under realistic device noise models derived from IBM quantum backends. We also perform preliminary validation on real quantum hardware using GHZ and W-state circuits. Across the synthetic benchmark, bounded extrapolation substantially reduces unphysical predictions and improves the stability of exponential- and polynomial--exponential-family models, whereas polynomial models show little difference between bounded and unbounded variants. Hardware experiments show similar qualitative behaviour: bounded models generally avoid pathological extrapolations and often provide a more reliable balance between accuracy and usable coverage. At the same time, the results highlight practical limitations of current devices, including stronger-than-expected noise effects and variability not fully captured by simulation models. These results suggest that enforcing physical constraints during extrapolation improves the reliability of ZNE and that this approach can be incorporated into existing workflows with minimal modification.

LGNov 13, 2024
Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset

Mohammad Saiful Islam, Mohamed Sami Rakha, William Pourmajidi et al.

As Large-Scale Cloud Systems (LCS) become increasingly complex, effective anomaly detection is critical for ensuring system reliability and performance. However, there is a shortage of large-scale, real-world datasets available for benchmarking anomaly detection methods. To address this gap, we introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console. This dataset comprises 39,365 rows and 117,448 columns of telemetry data. Additionally, we demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process. This study and the accompanying dataset provide a resource for researchers and practitioners in cloud system monitoring. It facilitates more efficient testing of anomaly detection methods in real-world data, helping to advance the development of robust solutions to maintain the health and performance of large-scale cloud infrastructures.

SEOct 31, 2024
Automating Quantum Software Maintenance: Flakiness Detection and Root Cause Analysis

Janakan Sivaloganathan, Ainaz Jamshidi, Andriy Miranskyy et al.

Flaky tests, which pass or fail inconsistently without code changes, are a major challenge in software engineering in general and in quantum software engineering in particular due to their complexity and probabilistic nature, leading to hidden issues and wasted developer effort. We aim to create an automated framework to detect flaky tests in quantum software and an extended dataset of quantum flaky tests, overcoming the limitations of manual methods. Building on prior manual analysis of 14 quantum software repositories, we expanded the dataset and automated flaky test detection using transformers and cosine similarity. We conducted experiments with Large Language Models (LLMs) from the OpenAI GPT and Meta LLaMA families to assess their ability to detect and classify flaky tests from code and issue descriptions. Embedding transformers proved effective: we identified 25 new flaky tests, expanding the dataset by 54%. Top LLMs achieved an F1-score of 0.8871 for flakiness detection but only 0.5839 for root cause identification. We introduced an automated flaky test detection framework using machine learning, showing promising results but highlighting the need for improved root cause detection and classification in large quantum codebases. Future work will focus on improving detection techniques and developing automatic flaky test fixes.

SEJul 16, 2025
QSpark: Towards Reliable Qiskit Code Generation

Kiana Kheiri, Aamna Aamir, Andriy Miranskyy et al.

Quantum circuits must be error-resilient, yet LLMs like Granite-20B-Code and StarCoder often output flawed Qiskit code. We fine-tuned the Qwen2.5-Coder-32B model with two RL methods, Group Relative Policy Optimization (GRPO) and Odds-Ratio Preference Optimization (ORPO), using a richly annotated synthetic dataset. On the Qiskit HumanEval benchmark, ORPO reaches 56.29% Pass@1 ($\approx+10$ pp over Granite-8B-QK) and GRPO hits 49%, both beating all general-purpose baselines; on the original HumanEval they score 65.90% and 63.00%. GRPO performs well on basic tasks (44/78) and excels on intermediate ones (41/68), but neither GRPO nor ORPO solves any of the five advanced tasks, highlighting clear gains yet room for progress in AI-assisted quantum programming.

CLApr 8, 2025
Assessing how hyperparameters impact Large Language Models' sarcasm detection performance

Montgomery Gole, Andriy Miranskyy

Sarcasm detection is challenging for both humans and machines. This work explores how model characteristics impact sarcasm detection in OpenAI's GPT, and Meta's Llama-2 models, given their strong natural language understanding, and popularity. We evaluate fine-tuned and zero-shot models across various sizes, releases, and hyperparameters. Experiments were conducted on the political and balanced (pol-bal) portion of the popular Self-Annotated Reddit Corpus (SARC2.0) sarcasm dataset. Fine-tuned performance improves monotonically with model size within a model family, while hyperparameter tuning also impacts performance. In the fine-tuning scenario, full precision Llama-2-13b achieves state-of-the-art accuracy and $F_1$-score, both measured at 0.83, comparable to average human performance. In the zero-shot setting, one GPT-4 model achieves competitive performance to prior attempts, yielding an accuracy of 0.70 and an $F_1$-score of 0.75. Furthermore, a model's performance may increase or decline with each release, highlighting the need to reassess performance after each release.

ETJan 10, 2022
EP-PQM: Efficient Parametric Probabilistic Quantum Memory with Fewer Qubits and Gates

Mushahid Khan, Jean Paul Latyr Faye, Udson C. Mendes et al.

Machine learning (ML) classification tasks can be carried out on a quantum computer (QC) using Probabilistic Quantum Memory (PQM) and its extension, Parameteric PQM (P-PQM) by calculating the Hamming distance between an input pattern and a database of $r$ patterns containing $z$ features with $a$ distinct attributes. For accurate computations, the feature must be encoded using one-hot encoding, which is memory-intensive for multi-attribute datasets with $a>2$. We can easily represent multi-attribute data more compactly on a classical computer by replacing one-hot encoding with label encoding. However, replacing these encoding schemes on a QC is not straightforward as PQM and P-PQM operate at the quantum bit level. We present an enhanced P-PQM, called EP-PQM, that allows label encoding of data stored in a PQM data structure and reduces the circuit depth of the data storage and retrieval procedures. We show implementations for an ideal QC and a noisy intermediate-scale quantum (NISQ) device. Our complexity analysis shows that the EP-PQM approach requires $O\left(z \log_2(a)\right)$ qubits as opposed to $O(za)$ qubits for P-PQM. EP-PQM also requires fewer gates, reducing gate count from $O\left(rza\right)$ to $O\left(rz\log_2(a)\right)$. For five datasets, we demonstrate that training an ML classification model using EP-PQM requires 48% to 77% fewer qubits than P-PQM for datasets with $a>2$. EP-PQM reduces circuit depth in the range of 60% to 96%, depending on the dataset. The depth decreases further with a decomposed circuit, ranging between 94% and 99%. EP-PQM requires less space; thus, it can train on and classify larger datasets than previous PQM implementations on NISQ devices. Furthermore, reducing the number of gates speeds up the classification and reduces the noise associated with deep quantum circuits. Thus, EP-PQM brings us closer to scalable ML on a NISQ device.

SEOct 16, 2021
Making existing software quantum safe: a case study on IBM Db2

Lei Zhang, Andriy Miranskyy, Walid Rjaibi et al.

The software engineering community is facing challenges from quantum computers (QCs). In the era of quantum computing, Shor's algorithm running on QCs can break asymmetric encryption algorithms that classical computers practically cannot. Though the exact date when QCs will become "dangerous" for practical problems is unknown, the consensus is that this future is near. Thus, the software engineering community needs to start making software ready for quantum attacks and ensure quantum safety proactively. We argue that the problem of evolving existing software to quantum-safe software is very similar to the Y2K bug. Thus, we leverage some best practices from the Y2K bug and propose our roadmap, called 7E, which gives developers a structured way to prepare for quantum attacks. It is intended to help developers start planning for the creation of new software and the evolution of cryptography in existing software. In this paper, we use a case study to validate the viability of 7E. Our software under study is the IBM Db2 database system. We upgrade the current cryptographic schemes to post-quantum cryptographic ones (using Kyber and Dilithium schemes) and report our findings and lessons learned. We show that the 7E roadmap effectively plans the evolution of existing software security features towards quantum safety, but it does require minor revisions. We incorporate our experience with IBM Db2 into the revised 7E roadmap. The U.S. Department of Commerce's National Institute of Standards and Technology is finalizing the post-quantum cryptographic standard. The software engineering community needs to start getting prepared for the quantum advantage era. We hope that our experiential study with IBM Db2 and the 7E roadmap will help the community prepare existing software for quantum attacks in a structured manner.

SEAug 21, 2021
Term Interrelations and Trends in Software Engineering

Janusan Baskararajah, Lei Zhang, Andriy Miranskyy

The Software Engineering (SE) community is prolific, making it challenging for experts to keep up with the flood of new papers and for neophytes to enter the field. Therefore, we posit that the community may benefit from a tool extracting terms and their interrelations from the SE community's text corpus and showing terms' trends. In this paper, we build a prototyping tool using the word embedding technique. We train the embeddings on the SE Body of Knowledge handbook and 15,233 research papers' titles and abstracts. We also create test cases necessary for validation of the training of the embeddings. We provide representative examples showing that the embeddings may aid in summarizing terms and uncovering trends in the knowledge base.

SEMar 16, 2021
On Testing and Debugging Quantum Software

Andriy Miranskyy, Lei Zhang, Javad Doliskani

Quantum computers are becoming more mainstream. As more programmers are starting to look at writing quantum programs, they need to test and debug their code. In this paper, we discuss various use-cases for quantum computers, either standalone or as part of a System of Systems. Based on these use-cases, we discuss some testing and debugging tactics that one can leverage to ensure the quality of the quantum software. We also highlight quantum-computer-specific issues and list novel techniques that are needed to address these issues. The practitioners can readily apply some of these tactics to their process of writing quantum programs, while researchers can learn about opportunities for future work.

SEFeb 12, 2021
On Automatic Parsing of Log Records

Jared Rand, Andriy Miranskyy

Software log analysis helps to maintain the health of software solutions and ensure compliance and security. Existing software systems consist of heterogeneous components emitting logs in various formats. A typical solution is to unify the logs using manually built parsers, which is laborious. Instead, we explore the possibility of automating the parsing task by employing machine translation (MT). We create a tool that generates synthetic Apache log records which we used to train recurrent-neural-network-based MT models. Models' evaluation on real-world logs shows that the models can learn Apache log format and parse individual log records. The median relative edit distance between an actual real-world log record and the MT prediction is less than or equal to 28%. Thus, we show that log parsing using an MT approach is promising.

DCOct 21, 2020
Anomaly Detection in a Large-scale Cloud Platform

Mohammad Saiful Islam, William Pourmajidi, Lei Zhang et al.

Cloud computing is ubiquitous: more and more companies are moving the workloads into the Cloud. However, this rise in popularity challenges Cloud service providers, as they need to monitor the quality of their ever-growing offerings effectively. To address the challenge, we designed and implemented an automated monitoring system for the IBM Cloud Platform. This monitoring system utilizes deep learning neural networks to detect anomalies in near-real-time in multiple Platform components simultaneously. After running the system for a year, we observed that the proposed solution frees the DevOps team's time and human resources from manually monitoring thousands of Cloud components. Moreover, it increases customer satisfaction by reducing the risk of Cloud outages. In this paper, we share our solutions' architecture, implementation notes, and best practices that emerged while evolving the monitoring system. They can be leveraged by other researchers and practitioners to build anomaly detectors for complex systems.

SESep 16, 2020
Immutable Log Storage as a Service on Private and Public Blockchains

William Pourmajidi, Lei Zhang, John Steinbacher et al.

Service Level Agreements (SLA) are employed to ensure the performance of Cloud solutions. When a component fails, the importance of logs increases significantly. All departments may turn to logs to determine the cause of the issue and find the party at fault. The party at fault may be motivated to tamper with the logs to hide their role. We argue that the critical nature of Cloud logs calls for immutability and verification mechanism without the presence of a single trusted party. This paper proposes such a mechanism by describing a blockchain-based log storage system, called Logchain, which can be integrated with existing private and public blockchain solutions. Logchain uses the immutability feature of blockchain to provide a tamper-resistance platform for log storage. Additionally, we propose a hierarchical structure to address blockchains' scalability issues. To validate the mechanism, we integrate Logchain into Ethereum and IBM Blockchain. We show that the solution is scalable and perform the analysis of the cost of ownership to help a reader select an implementation that would address their needs. The Logchain's scalability improvement on a blockchain is achieved without any alteration of blockchains' fundamental architecture. As shown in this work, it can function on private and public blockchains and, therefore, can be a suitable alternative for organizations that need a secure, immutable log storage platform.

SEMay 18, 2020
Anomaly Detection in Cloud Components

Mohammad Saiful Islam, Andriy Miranskyy

Cloud platforms, under the hood, consist of a complex inter-connected stack of hardware and software components. Each of these components can fail which may lead to an outage. Our goal is to improve the quality of Cloud services through early detection of such failures by analyzing resource utilization metrics. We tested Gated-Recurrent-Unit-based autoencoder with a likelihood function to detect anomalies in various multi-dimensional time series and achieved high performance.

SEJan 29, 2020
Is Your Quantum Program Bug-Free?

Andriy Miranskyy, Lei Zhang, Javad Doliskani

Quantum computers are becoming more mainstream. As more programmers are starting to look at writing quantum programs, they face an inevitable task of debugging their code. How should the programs for quantum computers be debugged? In this paper, we discuss existing debugging tactics, used in developing programs for classic computers, and show which ones can be readily adopted. We also highlight quantum-computer-specific debugging issues and list novel techniques that are needed to address these issues. The practitioners can readily apply some of these tactics to their process of writing quantum programs, while researchers can learn about opportunities for future work.

CRAug 28, 2019
Immutable Log Storage as a Service

William Pourmajidi, Lei Zhang, John Steinbacher et al.

Logs contain critical information about the quality of the rendered services on the Cloud and can be used as digital evidence. Hence, we argue that the critical nature of logs calls for immutability and verification mechanism without the presence of a single trusted party. In this paper, we propose a blockchain-based log system, called Logchain, which can be integrated with existing private and public blockchains. To validate the mechanism, we create Logchain as a Service (LCaaS) by integrating it with Ethereum public blockchain network. We show that the solution is scalable (being able to process 100 log files per second) and fast (being able to "seal" a log file in 23 seconds, on average).

SEJul 24, 2019
Quantum Advantage and Y2K Bug: Comparison

Lei Zhang, Andriy Miranskyy, Walid Rjaibi

Quantum Computers (QCs), once they mature, will be able to solve some problems faster than Classic Computers. This phenomenon is called "quantum advantage" (or a stronger term "quantum supremacy"). Quantum advantage will help us to speed up computations in many areas, from artificial intelligence to medicine. However, QC power can also be leveraged to break modern cryptographic algorithms, which pervade modern software: use cases range from encryption of Internet traffic, to encryption of disks, to signing blockchain ledgers. While the exact date when QCs will evolve to reach quantum advantage is unknown, the consensus is that this future is near. Thus, in order to maintain crypto agility of the software, one needs to start preparing for the era of quantum advantage proactively. In this paper, we recap the effect of quantum advantage on the existing and new software systems, as well as the data that we currently store. We also highlight similarities and differences between the security challenges brought by QCs and the challenges that software engineers faced twenty years ago while fixing widespread Y2K bug. Technically, the Y2K bug and the quantum advantage problems are different: the former was caused by timing-related problems, while the latter is caused by a cryptographic algorithm being non-quantum-resistant. However, conceptually, the problems are similar: we know what the root cause is, the fix (strategically) is straightforward, yet the implementation of the fix is challenging. To address the quantum advantage challenge, we create a seven-step roadmap, deemed 7E. It is inspired by the lessons-learnt from the Y2K era amalgamated with modern knowledge. The roadmap gives developers a structured way to start preparing for the quantum advantage era, helping them to start planning for the creation of new as well as the evolution of the existent software.

DCJul 13, 2019
Dogfooding: use IBM Cloud services to monitor IBM Cloud infrastructure

William Pourmajidi, Andriy Miranskyy, John Steinbacher et al.

The stability and performance of Cloud platforms are essential as they directly impact customers' satisfaction. Cloud service providers use Cloud monitoring tools to ensure that rendered services match the quality of service requirements indicated in established contracts such as service-level agreements. Given the enormous number of resources that need to be monitored, highly scalable and capable monitoring tools are designed and implemented by Cloud service providers such as Amazon, Google, IBM, and Microsoft. Cloud monitoring tools monitor millions of virtual and physical resources and continuously generate logs for each one of them. Considering that logs magnify any technical issue, they can be used for disaster detection, prevention, and recovery. However, logs are useless if they are not assessed and analyzed promptly. Thus, we argue that the scale of Cloud-generated logs makes it impossible for DevOps teams to analyze them effectively. This implies that one needs to automate the process of monitoring and analysis (e.g., using machine learning and artificial intelligence). If the automation will witness an anomaly in the logs --- it will alert DevOps staff. The automatic anomaly detectors require a reliable and scalable platform for gathering, filtering, and transforming the logs, executing the detector models, and sending out the alerts to the DevOps staff. In this work, we report on implementing a prototype of such a platform based on the 7-layered architecture pattern, which leverages micro-service principles to distribute tasks among highly scalable, resources-efficient modules. The modules interact with each other via an instance of the Publish-Subscribe architectural pattern. The platform is deployed on the IBM Cloud service infrastructure and is used to detect anomalies in logs emitted by the IBM Cloud services, hence the dogfooding.

SEDec 21, 2018
On Testing Quantum Programs

Andriy Miranskyy, Lei Zhang

A quantum computer (QC) can solve many computational problems more efficiently than a classic one. The field of QCs is growing: companies (such as DWave, IBM, Google, and Microsoft) are building QC offerings. We position that software engineers should look into defining a set of software engineering practices that apply to QC's software. To start this process, we give examples of challenges associated with testing such software and sketch potential solutions to some of these challenges.

SEJun 15, 2018
On Challenges of Cloud Monitoring

William Pourmajidi, John Steinbacher, Tony Erwin et al.

Cloud services are becoming increasingly popular: 60\% of information technology spending in 2016 was Cloud-based, and the size of the public Cloud service market will reach \$236B by 2020. To ensure reliable operation of the Cloud services, one must monitor their health. While a number of research challenges in the area of Cloud monitoring have been solved, problems are remaining. This prompted us to highlight three areas, which cause problems to practitioners and require further research. These three areas are as follows: A) defining health states of Cloud systems, B) creating unified monitoring environments, and C) establishing high availability strategies. In this paper we provide details of these areas and suggest a number of potential solutions to the challenges. We also show that Cloud monitoring presents exciting opportunities for novel research and practice.

CRMay 22, 2018
Logchain: Blockchain-assisted Log Storage

William Pourmajidi, Andriy Miranskyy

During the normal operation of a Cloud solution, no one usually pays attention to the logs except technical department, which may periodically check them to ensure that the performance of the platform conforms to the Service Level Agreements. However, the moment the status of a component changes from acceptable to unacceptable, or a customer complains about accessibility or performance of a platform, the importance of logs increases significantly. Depending on the scope of the issue, all departments, including management, customer support, and even the actual customer, may turn to logs to find out what has happened, how it has happened, and who is responsible for the issue. The party at fault may be motivated to tamper the logs to hide their fault. Given the number of logs that are generated by the Cloud solutions, there are many tampering possibilities. While tamper detection solution can be used to detect any changes in the logs, we argue that critical nature of logs calls for immutability. In this work, we propose a blockchain-based log system, called Logchain, that collects the logs from different providers and avoids log tampering by sealing the logs cryptographically and adding them to a hierarchical ledger, hence, providing an immutable platform for log storage.

SEMay 2, 2018
Architecture for Analysis of Streaming Data

Sheik Hoque, Andriy Miranskyy

While several attempts have been made to construct a scalable and flexible architecture for analysis of streaming data, no general model to tackle this task exists. Thus, our goal is to build a scalable and maintainable architecture for performing analytics on streaming data. To reach this goal, we introduce a 7-layered architecture consisting of microservices and publish-subscribe software. Our study shows that this architecture yields a good balance between scalability and maintainability due to high cohesion and low coupling of the solution, as well as asynchronous communication between the layers. This architecture can help practitioners to improve their analytic solutions. It is also of interest to academics, as it is a building block for a general architecture for processing streaming data.

SEMay 2, 2018
Online and Offline Analysis of Streaming Data

Sheik Hoque, Andriy Miranskyy

Online and offline analytics have been traditionally treated separately in software architecture design, and there is no existing general architecture that can support both. Our objective is to go beyond and introduce a scalable and maintainable architecture for performing online as well as offline analysis of streaming data. In this paper, we propose a 7-layered architecture utilising microservices, publish-subscribe pattern, and persistent storage. The architecture ensures high cohesion, low coupling, and asynchronous communication between the layers, thus yielding a scalable and maintainable solution. This design can help practitioners to engage their online and offline use cases in one single architecture, and also is of interest to academics, as it is a building block for a general architecture supporting data analysis.

SEJul 28, 2016
Towards Automated Performance Bug Identification in Python

Sokratis Tsakiltsidis, Andriy Miranskyy, Elie Mazzawi

Context: Software performance is a critical non-functional requirement, appearing in many fields such as mission critical applications, financial, and real time systems. In this work we focused on early detection of performance bugs; our software under study was a real time system used in the advertisement/marketing domain. Goal: Find a simple and easy to implement solution, predicting performance bugs. Method: We built several models using four machine learning methods, commonly used for defect prediction: C4.5 Decision Trees, Naïve Bayes, Bayesian Networks, and Logistic Regression. Results: Our empirical results show that a C4.5 model, using lines of code changed, file's age and size as explanatory variables, can be used to predict performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that reducing the number of changes delivered on a commit, can decrease the chance of performance bug injection. Conclusions: We believe that our approach can help practitioners to eliminate performance bugs early in the development cycle. Our results are also of interest to theoreticians, establishing a link between functional bugs and (non-functional) performance bugs, and explicitly showing that attributes used for prediction of functional bugs can be used for prediction of performance bugs.

SESep 24, 2015
Ordering stakeholder viewpoint concerns for holistic and incremental Enterprise Architecture: the W6H framework

Mujahid Sultan, Andriy Miranskyy

Context: Enterprise Architecture (EA) is a discipline which has evolved to structure the business and its alignment with the IT systems. One of the popular enterprise architecture frameworks is Zachman framework (ZF). This framework focuses on describing the enterprise from six viewpoint perspectives of the stakeholders. These six perspectives are based on English language interrogatives 'what', 'where', 'who', 'when', 'why', and 'how' (thus the term W5H Journalists and police investigators use the W5H to describe an event. However, EA is not an event, creation and evolution of EA challenging. Moreover, the ordering of viewpoints is not defined in the existing EA frameworks, making data capturing process difficult. Our goals are to 1) assess if W5H is sufficient to describe modern EA and 2) explore the ordering and precedence among the viewpoint concerns. Method: we achieve our goals by bringing tools from the Linguistics, focusing on a full set of English Language interrogatives to describe viewpoint concerns and the inter-relationships and dependencies among these. Application of these tools is validated using pedagogical EA examples. Results: 1) We show that addition of the seventh interrogative 'which' to the W5H set (we denote this extended set as W6H) yields extra and necessary information enabling creation of holistic EA. 2) We discover that particular ordering of the interrogatives, established by linguists (based on semantic and lexical analysis of English language interrogatives), define starting points and the order in which viewpoints should be arranged for creating complete EA. 3) We prove that adopting W6H enables creation of EA for iterative and agile SDLCs, e.g. Scrum. Conclusions: We believe that our findings complete creation of EA using ZF by practitioners, and provide theoreticians with tools needed to improve other EA frameworks, e.g., TOGAF and DoDAF.

SEAug 8, 2015
Ordering Interrogative Questions for Effective Requirements Engineering: The W6H Pattern

Mujahid Sultan, Andriy Miranskyy

Requirements elicitation and requirements analysis are important practices of Requirements Engineering. Elicitation techniques, such as interviews and questionnaires, rely on formulating interrogative questions and asking these in a proper order to maximize the accuracy of the information being gathered. Information gathered during requirements elicitation then has to be interpreted, analyzed, and validated. Requirements analysis involves analyzing the problem and solutions spaces. In this paper, we describe a method to formulate interrogative questions for effective requirements elicitation based on the lexical and semantic principles of the English language interrogatives, and propose a pattern to organize stakeholder viewpoint concerns for better requirements analysis. This helps requirements engineer thoroughly describe problem and solutions spaces. Most of the previous requirements elicitation studies included six out of the seven English language interrogatives 'what', 'where', 'when', 'who', 'why', and 'how' (denoted by W5H) and did not propose any order in the interrogatives. We show that extending the set of six interrogatives with 'which' (denoted by W6H) improves the generation and formulation of questions for requirements elicitation and facilitates better requirements analysis via arranging stakeholder views. We discuss the interdependencies among interrogatives (for requirements engineer to consider while eliciting the requirements) and suggest an order for the set of W6H interrogatives. The proposed W6H-based reusable pattern also aids requirements engineer in organizing viewpoint concerns of stakeholders, making this pattern an effective tool for requirements analysis.