AIJan 26, 2023
Towards Knowledge-Centric Process MiningAsjad Khan, Arsal Huda, Aditya Ghose et al.
Process analytic approaches play a critical role in supporting the practice of business process management and continuous process improvement by leveraging process-related data to identify performance bottlenecks, extracting insights about reducing costs and optimizing the utilization of available resources. Process analytic techniques often have to contend with real-world settings where available logs are noisy or incomplete. In this paper we present an approach that permits process analytics techniques to deliver value in the face of noisy/incomplete event logs. Our approach leverages knowledge graphs to mitigate the effects of noise in event logs while supporting process analysts in understanding variability associated with event logs.
CRDec 28, 2021Code
Mining and Classifying Privacy and Data Protection Requirements in Issue ReportsPattaraporn Sangaroonsilp, Hoa Khanh Dam, Morakot Choetkiertikul et al.
Digital and physical footprints are a trail of user activities collected over the use of software applications and systems. As software becomes ubiquitous, protecting user privacy has become challenging. With the increase of user privacy awareness and advent of privacy regulations and policies, there is an emerging need to implement software systems that enhance the protection of personal data processing. However, existing data protection and privacy regulations provide key principles in high-level, making it difficult for software engineers to design and implement privacy-aware systems. In this paper, we develop a taxonomy that provides a comprehensive set of privacy requirements based on four well-established personal data protection regulations and privacy frameworks, the General Data Protection Regulation (GDPR), ISO/IEC 29100, Thailand Personal Data Protection Act (Thailand PDPA) and Asia-Pacific Economic Cooperation (APEC) privacy framework. These requirements are extracted, refined and classified into a level that can be used to map with issue reports. We have also performed a study on how two large open-source software projects (Google Chrome and Moodle) address the privacy requirements in our taxonomy through mining their issue reports. The paper discusses how the collected issues were classified, and presents the findings and insights generated from our study. Mining and classifying privacy requirements in issue reports can help organisations be aware of their state of compliance by identifying privacy requirements that have not been addressed in their software projects. The taxonomy can also trace back to regulations, standards and frameworks that the software projects have not complied with based on the identified privacy requirements.
SEJan 5, 2021Code
A Taxonomy for Mining and Classifying Privacy Requirements in Issue ReportsPattaraporn Sangaroonsilp, Hoa Khanh Dam, Morakot Choetkiertikul et al.
Context: Digital and physical trails of user activities are collected over the use of software applications and systems. As software becomes ubiquitous, protecting user privacy has become challenging. With the increase of user privacy awareness and advent of privacy regulations and policies, there is an emerging need to implement software systems that enhance the protection of personal data processing. However, existing data protection and privacy regulations provide key principles in high-level, making it difficult for software engineers to design and implement privacy-aware systems. Objective: In this paper, we develop a taxonomy that provides a comprehensive set of privacy requirements based on four well-established personal data protection regulations and privacy frameworks, the General Data Protection Regulation (GDPR), ISO/IEC 29100, Thailand Personal Data Protection Act (Thailand PDPA) and Asia-Pacific Economic Cooperation (APEC) privacy framework. Methods: These requirements are extracted, refined and classified (using the goal-based requirements analysis method) into a level that can be used to map with issue reports. We have also performed a study on how two large open-source software projects (Google Chrome and Moodle) address the privacy requirements in our taxonomy through mining their issue reports. Results: The paper discusses how the collected issues were classified, and presents the findings and insights generated from our study. Conclusion: Mining and classifying privacy requirements in issue reports can help organisations be aware of their state of compliance by identifying privacy requirements that have not been addressed in their software projects. The taxonomy can also trace back to regulations, standards and frameworks that the software projects have not complied with based on the identified privacy requirements.
SEFeb 3, 2018Code
A deep tree-based model for software defect predictionHoa Khanh Dam, Trang Pham, Shien Wee Ng et al.
Defects are common in software systems and can potentially cause various problems to software users. Different methods have been developed to quickly predict the most likely locations of defects in large code bases. Most of them focus on designing features (e.g. complexity metrics) that correlate with potentially defective code. Those approaches however do not sufficiently capture the syntax and different levels of semantics of source code, an important capability for building accurate prediction models. In this paper, we develop a novel prediction model which is capable of automatically learning features for representing source code and using them for defect prediction. Our prediction system is built upon the powerful deep learning, tree-structured Long Short Term Memory network which directly matches with the Abstract Syntax Tree representation of source code. An evaluation on two datasets, one from open source projects contributed by Samsung and the other from the public PROMISE repository, demonstrates the effectiveness of our approach for both within-project and cross-project predictions.
SESep 2, 2016Code
A deep learning model for estimating story pointsMorakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran et al.
Although there has been substantial research in software analytics for effort estimation in traditional software projects, little work has been done for estimation in agile projects, especially estimating user stories or issues. Story points are the most common unit of measure used for estimating the effort involved in implementing a user story or resolving an issue. In this paper, we offer for the \emph{first} time a comprehensive dataset for story points-based estimation that contains 23,313 issues from 16 open source projects. We also propose a prediction model for estimating story points based on a novel combination of two powerful deep learning architectures: long short-term memory and recurrent highway network. Our prediction system is \emph{end-to-end} trainable from raw input data to prediction outcomes without any manual feature engineering. An empirical evaluation demonstrates that our approach consistently outperforms three common effort estimation baselines and two alternatives in both Mean Absolute Error and the Standardized Accuracy.
AIFeb 16, 2025
Game-Of-Goals: Using adversarial games to achieve strategic resilienceAditya Ghose, Asjad Khan
Our objective in this paper is to develop a machinery that makes a given organizational strategic plan resilient to the actions of competitor agents (adverse environmental actions). We assume that we are given a goal tree representing strategic goals (can also be seen business requirements for a software systems) with the assumption that competitor agents are behaving in a maximally adversarial fashion(opposing actions against our sub goals or goals in general). We use game tree search methods (such as minimax) to select an optimal execution strategy(at a given point in time), such that it can maximize our chances of achieving our (high level) strategic goals. Our machinery helps us determine which path to follow(strategy selection) to achieve the best end outcome. This is done by comparing alternative execution strategies available to us via an evaluation function. Our evaluation function is based on the idea that we want to make our execution plans defensible(future-proof) by selecting execution strategies that make us least vulnerable to adversarial actions by the competitor agents. i.e we want to select an execution strategy such that its leaves minimum room(or options) for the adversary to cause impediment/damage to our business goals/plans.
SEDec 28, 2021
On Privacy Weaknesses and Vulnerabilities in Software SystemsPattaraporn Sangaroonsilp, Hoa Khanh Dam, Aditya Ghose
In this digital era, our privacy is under constant threat as our personal data and traceable online/offline activities are frequently collected, processed and transferred by many software applications. Privacy attacks are often formed by exploiting vulnerabilities found in those software applications. The Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) systems are currently the main sources that software engineers rely on for understanding and preventing publicly disclosed software vulnerabilities. However, our study on all 922 weaknesses in the CWE and 156,537 vulnerabilities registered in the CVE to date has found a very small coverage of privacy-related vulnerabilities in both systems, only 4.45\% in CWE and 0.1\% in CVE. These also cover only a small number of areas of privacy threats that have been raised in existing privacy software engineering research, privacy regulations and frameworks, and relevant reputable organisations. The actionable insights generated from our study led to the introduction of 11 new common privacy weaknesses to supplement the CWE system, making it become a source for both security and privacy vulnerabilities.
SEMay 5, 2021
Engineering Blockchain Based Software Systems: Foundations, Survey, and Future DirectionsMahdi Fahmideh, John Grundy, Aakash Ahmed et al.
Many scientific and practical areas have shown increasing interest in reaping the benefits of blockchain technology to empower software systems. However, the unique characteristics and requirements associated with Blockchain Based Software (BBS) systems raise new challenges across the development lifecycle that entail an extensive improvement of conventional software engineering. This article presents a systematic literature review of the state-of-the-art in BBS engineering research from a software engineering perspective. We characterize BBS engineering from the theoretical foundations, processes, models, and roles and discuss a rich repertoire of key development activities, principles, challenges, and techniques. The focus and depth of this survey not only gives software engineering practitioners and researchers a consolidated body of knowledge about current BBS development but also underpins a starting point for further research in this field.
SEFeb 4, 2021
Human Values in Software Release PlanningDavoud Mougouei, Aditya Ghose, Hoa Dam et al.
Software products have become an integral part of human lives, and therefore need to account for human values such as privacy, fairness, and equality. Ignoring human values in software development leads to biases and violations of human values: racial biases in recidivism assessment and facial recognition software are well-known examples of such issues. One of the most critical steps in software development is Software Release Planning (SRP), where decisions are made about the presence or absence of the requirements (features) in the software. Such decisions are primarily guided by the economic value of the requirements, ignoring their impacts on a broader range of human values. That may result in ignoring (selecting) requirements that positively (negatively) impact human values, increasing the risk of value breaches in the software. To address this, we have proposed an Integer Programming approach to considering human values in software release planning. In this regard, an Integer Linear Programming (ILP) model has been proposed, that explicitly accounts for human values in finding an "optimal" subset of the requirements. The ILP model exploits the algebraic structure of fuzzy graphs to capture dependencies and conflicts among the values of the requirements.
SEDec 23, 2020
A Framework for Conditional Statement Technical Debt Identification and DescriptionAbdulaziz Alhefdhi, Hoa Khanh Dam, Yusuf Sulistyo Nugroho et al.
Technical Debt occurs when development teams favour short-term operability over long-term stability. Since this places software maintainability at risk, technical debt requires early attention to avoid paying for accumulated interest. Most of the existing work focuses on detecting technical debt using code comments, known as Self-Admitted Technical Debt (SATD). However, there are many cases where technical debt instances are not explicitly acknowledged but deeply hidden in the code. In this paper, we propose a framework that caters for the absence of SATD comments in code. Our Self-Admitted Technical Debt Identification and Description (SATDID) framework determines if technical debt should be self-admitted for an input code fragment. If that is the case, SATDID will automatically generate the appropriate descriptive SATD comment that can be attached with the code. While our approach is applicable in principle to any type of code fragments, we focus in this study on technical debt hidden in conditional statements, one of the most TD-carrying parts of code. We explore and evaluate different implementations of SATDID. The evaluation results demonstrate the applicability and effectiveness of our framework over multiple benchmarks. Comparing with the results from the benchmarks, our approach provides at least 21.35%, 59.36%, 31.78%, and 583.33% improvements in terms of Precision, Recall, F-1, and Bleu-4 scores, respectively. In addition, we conduct human evaluation to the SATD comments generated by SATDID. In 1-5 and 0-5 scales for Acceptability and Understandability, the total means achieved by our approach are 3.128 and 3.172, respectively.
SEDec 21, 2020
Adversarial Patch Generation for Automated Program RepairAbdulaziz Alhefdhi, Hoa Khanh Dam, Thanh Le-Cong et al.
Automated Program Repair has attracted significant research in recent years, leading to diverse techniques that focus on two main directions: search-based and semantic-based program repair. The former techniques often face challenges due to the vast search space, resulting in difficulties in identifying correct solutions, while the latter approaches are constrained by the capabilities of the underlying semantic analyser, limiting their scalability. In this paper, we propose NEVERMORE, a novel learning-based mechanism inspired by the adversarial nature of bugs and fixes. NEVERMORE is built upon the Generative Adversarial Networks architecture and trained on historical bug fixes to generate repairs that closely mimic human-produced fixes. Our empirical evaluation on 500 real-world bugs demonstrates the effectiveness of NEVERMORE in bug-fixing, generating repairs that match human fixes for 21.2% of the examined bugs. Moreover, we evaluate NEVERMORE on the Defects4J dataset, where our approach generates repairs for 4 bugs that remained unresolved by state-of-the-art baselines. NEVERMORE also fixes another 8 bugs which were only resolved by a subset of these baselines. Finally, we conduct an in-depth analysis of the impact of input and training styles on NEVERMORE's performance, revealing where the chosen style influences the model's bug-fixing capabilities.
AIJul 2, 2019
On Conforming and Conflicting ValuesKinzang Chhogyal, Abhaya Nayak, Aditya Ghose et al.
Values are things that are important to us. Actions activate values - they either go against our values or they promote our values. Values themselves can either be conforming or conflicting depending on the action that is taken. In this short paper, we argue that values may be classified as one of two types - conflicting and inherently conflicting values. They are distinguished by the fact that the latter in some sense can be thought of as being independent of actions. This allows us to do two things: i) check whether a set of values is consistent and ii) check whether it is in conflict with other sets of values.
AIMay 31, 2019
A Value-based Trust Assessment Model for Multi-agent SystemsKinzang Chhogyal, Abhaya Nayak, Aditya Ghose et al.
An agent's assessment of its trust in another agent is commonly taken to be a measure of the reliability/predictability of the latter's actions. It is based on the trustor's past observations of the behaviour of the trustee and requires no knowledge of the inner-workings of the trustee. However, in situations that are new or unfamiliar, past observations are of little help in assessing trust. In such cases, knowledge about the trustee can help. A particular type of knowledge is that of values - things that are important to the trustor and the trustee. In this paper, based on the premise that the more values two agents share, the more they should trust one another, we propose a simple approach to trust assessment between agents based on values, taking into account if agents trust cautiously or boldly, and if they depend on others in carrying out a task.
SEDec 27, 2018
Towards effective AI-powered agile project managementHoa Khanh Dam, Truyen Tran, John Grundy et al.
The rise of Artificial intelligence (AI) has the potential to significantly transform the practice of project management. Project management has a large socio-technical element with many uncertainties arising from variability in human aspects e.g., customers' needs, developers' performance and team dynamics. AI can assist project managers and team members by automating repetitive, high-volume tasks to enable project analytics for estimation and risk prediction, providing actionable recommendations, and even making decisions. AI is potentially a game changer for project management in helping to accelerate productivity and increase project success rates. In this paper, we propose a framework where AI technologies can be leveraged to offer support for managing agile projects, which have become increasingly popular in the industry.
NEFeb 3, 2018
DeepProcess: Supporting business process execution using a MANN-based recommender systemAsjad Khan, Hung Le, Kien Do et al.
Process-aware Recommender systems can provide critical decision support functionality to aid business process execution by recommending what actions to take next. Based on recent advances in the field of deep learning, we present a novel memory-augmented neural network (MANN) based approach for constructing a process-aware recommender system. We propose a novel network architecture, namely Write-Protected Dual Controller Memory-Augmented Neural Network (DCw-MANN), for building prescriptive models. To evaluate the feasibility and usefulness of our approach, we consider three real-world datasets and show that our approach leads to better performance on several baselines for the task of suffix recommendation and next task prediction.
SEFeb 2, 2018
Explainable Software AnalyticsHoa Khanh Dam, Truyen Tran, Aditya Ghose
Software analytics has been the subject of considerable recent attention but is yet to receive significant industry traction. One of the key reasons is that software practitioners are reluctant to trust predictions produced by the analytics machinery without understanding the rationale for those predictions. While complex models such as deep learning and ensemble methods improve predictive performance, they have limited explainability. In this paper, we argue that making software analytics models explainable to software practitioners is as \emph{important} as achieving accurate predictions. Explainability should therefore be a key measure for evaluating software analytics models. We envision that explainability will be a key driver for developing software analytics models that are useful in practice. We outline a research roadmap for this space, building on social science, explainable artificial intelligence and software engineering.
SEAug 8, 2017
Automatic feature learning for vulnerability predictionHoa Khanh Dam, Truyen Tran, Trang Pham et al.
Code flaws or vulnerabilities are prevalent in software systems and can potentially cause a variety of problems including deadlock, information loss, or system failure. A variety of approaches have been developed to try and detect the most likely locations of such code vulnerabilities in large code bases. Most of them rely on manually designing features (e.g. complexity metrics or frequencies of code tokens) that represent the characteristics of the code. However, all suffer from challenges in sufficiently capturing both semantic and syntactic representation of source code, an important capability for building accurate prediction models. In this paper, we describe a new approach, built upon the powerful deep learning Long Short Term Memory model, to automatically learn both semantic and syntactic features in code. Our evaluation on 18 Android applications demonstrates that the prediction power obtained from our learned features is equal or even superior to what is achieved by state of the art vulnerability prediction models: 3%--58% improvement for within-project prediction and 85% for cross-project prediction.
SEJul 30, 2016
DeepSoft: A vision for a deep model of softwareHoa Khanh Dam, Truyen Tran, John Grundy et al.
Although software analytics has experienced rapid growth as a research area, it has not yet reached its full potential for wide industrial adoption. Most of the existing work in software analytics still relies heavily on costly manual feature engineering processes, and they mainly address the traditional classification problems, as opposed to predicting future events. We present a vision for \emph{DeepSoft}, an \emph{end-to-end} generic framework for modeling software and its development process to predict future risks and recommend interventions. DeepSoft, partly inspired by human memory, is built upon the powerful deep learning-based Long Short Term Memory architecture that is capable of learning long-term temporal dependencies that occur in software evolution. Such deep learned patterns of software can be used to address a range of challenging problems such as code and task recommendation and prediction. DeepSoft provides a new approach for research into modeling of source code, risk prediction and mitigation, developer modeling, and automatically generating code patches from bug reports.
SEJul 24, 2015
Extracting State Transition Models from i* ModelsNovarun Deb, Nabendu Chaki, Aditya Ghose
i* models are inherently sequence agnostic. There is an immediate need to bridge the gap between such a sequence agnostic model and an industry implemented process modelling standard like Business Process Modelling Notation (BPMN). This work is an attempt to build State Transition Models from i* models. In this paper, we first spell out the Naive Algorithm formally, which is on the lines of Formal Tropos. We demonstrate how the growth of the State Transition Model Space can be mapped to the problem of finding the number of possible paths between the Least Upper Bound (LUB) and the Greatest Lower Bound (GLB) of a k-dimensional hypercube Lattice structure. We formally present the mathematics for doing a quantitative analysis of the space growth. The Naive Algorithm has its main drawback in the hyperexponential explosion caused in the State Transition Model space. This is identified and the Semantic Implosion Algorithm is proposed which exploits the temporal information embedded within the i* model of an enterprise to reduce the rate of growth of the State Transition Model space. A comparative quantitative analysis between the two approaches concludes the superiority of the Semantic Implosion Algorithm.
SEFeb 25, 2014
Towards rational and minimal change propagation in model evolutionHoa Khanh Dam, Aditya Ghose
A critical issue in the evolution of software models is change propagation: given a primary change that is made to a model in order to meet a new or changed requirement, what additional secondary changes are needed to maintain consistency within the model, and between the model and other models in the system? In practice, there are many ways of propagating changes to fix a given inconsistency, and how to justify and automate the selection between such change options remains a critical challenge. In this paper, we propose a number of postulates, inspired by the mature belief revision theory, that a change propagation process should satisfy to be considered rational and minimal. Such postulates enable us to reason about selecting alternative change options, and consequently to develop a machinery that automatically performs this task. We further argue that a possible implementation of such a change propagation process can be considered as a classical state space search in which each state represents a snapshot of the model in the process. This view naturally reflects the cascading nature of change propagation, where each change can require further changes to be made.