AIJan 26, 2023
Towards Knowledge-Centric Process MiningAsjad Khan, Arsal Huda, Aditya Ghose et al.
Process analytic approaches play a critical role in supporting the practice of business process management and continuous process improvement by leveraging process-related data to identify performance bottlenecks, extracting insights about reducing costs and optimizing the utilization of available resources. Process analytic techniques often have to contend with real-world settings where available logs are noisy or incomplete. In this paper we present an approach that permits process analytics techniques to deliver value in the face of noisy/incomplete event logs. Our approach leverages knowledge graphs to mitigate the effects of noise in event logs while supporting process analysts in understanding variability associated with event logs.
47.7ETMar 25
BRIDG-Q: Barren-Plateau-Resilient Initialisation with Data-Aware LLM-Generated Quantum CircuitsNgoc Nhi Nguyen, Thai T Vu, John Le et al.
Quantum circuit initialisation is a key bottleneck in variational quantum algorithms (VQAs), strongly impacting optimisation stability and convergence. Recent work shows that large language models (LLMs) can synthesise high-quality variational circuit architectures, but their continuous parameter predictions are unreliable. Conversely, data-driven initialisation methods such as BEINIT improve trainability via problem-adaptive priors, yet assume fixed ansatz templates and ignore generative circuit structure. We propose BRIDG-Q (Barren-Plateau-Resilient Initialisation with Data-Aware LLM-Generated Quantum Circuits), a neuro-symbolic pipeline that bridges this gap by coupling LLM-generated circuit architectures with empirical-Bayes parameter initialisation. BRIDG-Q uses AgentQ to generate problem-conditioned circuit topologies, removes generated parameters, and injects data-informed parameter initialisations to mitigate barren plateau effects. Evaluations on graph optimisation benchmarks using residual energy gap and convergence metrics show improved optimisation robustness, indicating that data-driven initialisation remains effective even for LLM-generated circuits, with oracle per-instance selection achieving approximately a 10% reduction in final residual energy.
46.0SEMar 24
Fault-Tolerant Design and Multi-Objective Model Checking for Real-Time Deep Reinforcement Learning SystemsGuoxin Su, Thomas Robinson, Hoa Khanh Dam et al.
Deep reinforcement learning (DRL) has emerged as a powerful paradigm for solving complex decision-making problems. However, DRL-based systems still face significant dependability challenges particularly in real-time environments due to the simulation-to-reality gap, out-of-distribution observations, and the critical impact of latency. Latency-induced faults, in particular, can lead to unsafe or unstable behaviour, yet existing fault-tolerance approaches to DRL systems lack formal methods to rigorously analyse and optimise performance and safety simultaneously in real-time settings. To address this, we propose a formal framework for designing and analysing real-time switching mechanisms between DRL agents and alternative controllers. Our approach leverages Timed Automata (TAs) for explicit switch logic design, which is then syntactically converted to a Markov Decision Process (MDP) for formal analysis. We develop a novel convex query technique for multi-objective model checking, enabling the optimisation of soft performance objectives while ensuring hard safety constraints for MDPs. Furthermore, we present MOPMC, a GPU-accelerated software tool implementing this technique, demonstrating superior scalability in both model size and objective numbers.
CRDec 28, 2021Code
Mining and Classifying Privacy and Data Protection Requirements in Issue ReportsPattaraporn Sangaroonsilp, Hoa Khanh Dam, Morakot Choetkiertikul et al.
Digital and physical footprints are a trail of user activities collected over the use of software applications and systems. As software becomes ubiquitous, protecting user privacy has become challenging. With the increase of user privacy awareness and advent of privacy regulations and policies, there is an emerging need to implement software systems that enhance the protection of personal data processing. However, existing data protection and privacy regulations provide key principles in high-level, making it difficult for software engineers to design and implement privacy-aware systems. In this paper, we develop a taxonomy that provides a comprehensive set of privacy requirements based on four well-established personal data protection regulations and privacy frameworks, the General Data Protection Regulation (GDPR), ISO/IEC 29100, Thailand Personal Data Protection Act (Thailand PDPA) and Asia-Pacific Economic Cooperation (APEC) privacy framework. These requirements are extracted, refined and classified into a level that can be used to map with issue reports. We have also performed a study on how two large open-source software projects (Google Chrome and Moodle) address the privacy requirements in our taxonomy through mining their issue reports. The paper discusses how the collected issues were classified, and presents the findings and insights generated from our study. Mining and classifying privacy requirements in issue reports can help organisations be aware of their state of compliance by identifying privacy requirements that have not been addressed in their software projects. The taxonomy can also trace back to regulations, standards and frameworks that the software projects have not complied with based on the identified privacy requirements.
SEJan 5, 2021Code
A Taxonomy for Mining and Classifying Privacy Requirements in Issue ReportsPattaraporn Sangaroonsilp, Hoa Khanh Dam, Morakot Choetkiertikul et al.
Context: Digital and physical trails of user activities are collected over the use of software applications and systems. As software becomes ubiquitous, protecting user privacy has become challenging. With the increase of user privacy awareness and advent of privacy regulations and policies, there is an emerging need to implement software systems that enhance the protection of personal data processing. However, existing data protection and privacy regulations provide key principles in high-level, making it difficult for software engineers to design and implement privacy-aware systems. Objective: In this paper, we develop a taxonomy that provides a comprehensive set of privacy requirements based on four well-established personal data protection regulations and privacy frameworks, the General Data Protection Regulation (GDPR), ISO/IEC 29100, Thailand Personal Data Protection Act (Thailand PDPA) and Asia-Pacific Economic Cooperation (APEC) privacy framework. Methods: These requirements are extracted, refined and classified (using the goal-based requirements analysis method) into a level that can be used to map with issue reports. We have also performed a study on how two large open-source software projects (Google Chrome and Moodle) address the privacy requirements in our taxonomy through mining their issue reports. Results: The paper discusses how the collected issues were classified, and presents the findings and insights generated from our study. Conclusion: Mining and classifying privacy requirements in issue reports can help organisations be aware of their state of compliance by identifying privacy requirements that have not been addressed in their software projects. The taxonomy can also trace back to regulations, standards and frameworks that the software projects have not complied with based on the identified privacy requirements.
SEFeb 3, 2018Code
A deep tree-based model for software defect predictionHoa Khanh Dam, Trang Pham, Shien Wee Ng et al.
Defects are common in software systems and can potentially cause various problems to software users. Different methods have been developed to quickly predict the most likely locations of defects in large code bases. Most of them focus on designing features (e.g. complexity metrics) that correlate with potentially defective code. Those approaches however do not sufficiently capture the syntax and different levels of semantics of source code, an important capability for building accurate prediction models. In this paper, we develop a novel prediction model which is capable of automatically learning features for representing source code and using them for defect prediction. Our prediction system is built upon the powerful deep learning, tree-structured Long Short Term Memory network which directly matches with the Abstract Syntax Tree representation of source code. An evaluation on two datasets, one from open source projects contributed by Samsung and the other from the public PROMISE repository, demonstrates the effectiveness of our approach for both within-project and cross-project predictions.
SESep 2, 2016Code
A deep learning model for estimating story pointsMorakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran et al.
Although there has been substantial research in software analytics for effort estimation in traditional software projects, little work has been done for estimation in agile projects, especially estimating user stories or issues. Story points are the most common unit of measure used for estimating the effort involved in implementing a user story or resolving an issue. In this paper, we offer for the \emph{first} time a comprehensive dataset for story points-based estimation that contains 23,313 issues from 16 open source projects. We also propose a prediction model for estimating story points based on a novel combination of two powerful deep learning architectures: long short-term memory and recurrent highway network. Our prediction system is \emph{end-to-end} trainable from raw input data to prediction outcomes without any manual feature engineering. An empirical evaluation demonstrates that our approach consistently outperforms three common effort estimation baselines and two alternatives in both Mean Absolute Error and the Standardized Accuracy.
SEDec 28, 2021
On Privacy Weaknesses and Vulnerabilities in Software SystemsPattaraporn Sangaroonsilp, Hoa Khanh Dam, Aditya Ghose
In this digital era, our privacy is under constant threat as our personal data and traceable online/offline activities are frequently collected, processed and transferred by many software applications. Privacy attacks are often formed by exploiting vulnerabilities found in those software applications. The Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) systems are currently the main sources that software engineers rely on for understanding and preventing publicly disclosed software vulnerabilities. However, our study on all 922 weaknesses in the CWE and 156,537 vulnerabilities registered in the CVE to date has found a very small coverage of privacy-related vulnerabilities in both systems, only 4.45\% in CWE and 0.1\% in CVE. These also cover only a small number of areas of privacy threats that have been raised in existing privacy software engineering research, privacy regulations and frameworks, and relevant reputable organisations. The actionable insights generated from our study led to the introduction of 11 new common privacy weaknesses to supplement the CWE system, making it become a source for both security and privacy vulnerabilities.
SEDec 23, 2020
A Framework for Conditional Statement Technical Debt Identification and DescriptionAbdulaziz Alhefdhi, Hoa Khanh Dam, Yusuf Sulistyo Nugroho et al.
Technical Debt occurs when development teams favour short-term operability over long-term stability. Since this places software maintainability at risk, technical debt requires early attention to avoid paying for accumulated interest. Most of the existing work focuses on detecting technical debt using code comments, known as Self-Admitted Technical Debt (SATD). However, there are many cases where technical debt instances are not explicitly acknowledged but deeply hidden in the code. In this paper, we propose a framework that caters for the absence of SATD comments in code. Our Self-Admitted Technical Debt Identification and Description (SATDID) framework determines if technical debt should be self-admitted for an input code fragment. If that is the case, SATDID will automatically generate the appropriate descriptive SATD comment that can be attached with the code. While our approach is applicable in principle to any type of code fragments, we focus in this study on technical debt hidden in conditional statements, one of the most TD-carrying parts of code. We explore and evaluate different implementations of SATDID. The evaluation results demonstrate the applicability and effectiveness of our framework over multiple benchmarks. Comparing with the results from the benchmarks, our approach provides at least 21.35%, 59.36%, 31.78%, and 583.33% improvements in terms of Precision, Recall, F-1, and Bleu-4 scores, respectively. In addition, we conduct human evaluation to the SATD comments generated by SATDID. In 1-5 and 0-5 scales for Acceptability and Understandability, the total means achieved by our approach are 3.128 and 3.172, respectively.
SEDec 21, 2020
Adversarial Patch Generation for Automated Program RepairAbdulaziz Alhefdhi, Hoa Khanh Dam, Thanh Le-Cong et al.
Automated Program Repair has attracted significant research in recent years, leading to diverse techniques that focus on two main directions: search-based and semantic-based program repair. The former techniques often face challenges due to the vast search space, resulting in difficulties in identifying correct solutions, while the latter approaches are constrained by the capabilities of the underlying semantic analyser, limiting their scalability. In this paper, we propose NEVERMORE, a novel learning-based mechanism inspired by the adversarial nature of bugs and fixes. NEVERMORE is built upon the Generative Adversarial Networks architecture and trained on historical bug fixes to generate repairs that closely mimic human-produced fixes. Our empirical evaluation on 500 real-world bugs demonstrates the effectiveness of NEVERMORE in bug-fixing, generating repairs that match human fixes for 21.2% of the examined bugs. Moreover, we evaluate NEVERMORE on the Defects4J dataset, where our approach generates repairs for 4 bugs that remained unresolved by state-of-the-art baselines. NEVERMORE also fixes another 8 bugs which were only resolved by a subset of these baselines. Finally, we conduct an in-depth analysis of the impact of input and training styles on NEVERMORE's performance, revealing where the chosen style influences the model's bug-fixing capabilities.
AIMay 31, 2019
A Value-based Trust Assessment Model for Multi-agent SystemsKinzang Chhogyal, Abhaya Nayak, Aditya Ghose et al.
An agent's assessment of its trust in another agent is commonly taken to be a measure of the reliability/predictability of the latter's actions. It is based on the trustor's past observations of the behaviour of the trustee and requires no knowledge of the inner-workings of the trustee. However, in situations that are new or unfamiliar, past observations are of little help in assessing trust. In such cases, knowledge about the trustee can help. A particular type of knowledge is that of values - things that are important to the trustor and the trustee. In this paper, based on the premise that the more values two agents share, the more they should trust one another, we propose a simple approach to trust assessment between agents based on values, taking into account if agents trust cautiously or boldly, and if they depend on others in carrying out a task.
SEDec 27, 2018
Towards effective AI-powered agile project managementHoa Khanh Dam, Truyen Tran, John Grundy et al.
The rise of Artificial intelligence (AI) has the potential to significantly transform the practice of project management. Project management has a large socio-technical element with many uncertainties arising from variability in human aspects e.g., customers' needs, developers' performance and team dynamics. AI can assist project managers and team members by automating repetitive, high-volume tasks to enable project analytics for estimation and risk prediction, providing actionable recommendations, and even making decisions. AI is potentially a game changer for project management in helping to accelerate productivity and increase project success rates. In this paper, we propose a framework where AI technologies can be leveraged to offer support for managing agile projects, which have become increasingly popular in the industry.
SEFeb 2, 2018
Explainable Software AnalyticsHoa Khanh Dam, Truyen Tran, Aditya Ghose
Software analytics has been the subject of considerable recent attention but is yet to receive significant industry traction. One of the key reasons is that software practitioners are reluctant to trust predictions produced by the analytics machinery without understanding the rationale for those predictions. While complex models such as deep learning and ensemble methods improve predictive performance, they have limited explainability. In this paper, we argue that making software analytics models explainable to software practitioners is as \emph{important} as achieving accurate predictions. Explainability should therefore be a key measure for evaluating software analytics models. We envision that explainability will be a key driver for developing software analytics models that are useful in practice. We outline a research roadmap for this space, building on social science, explainable artificial intelligence and software engineering.
SEAug 8, 2017
Automatic feature learning for vulnerability predictionHoa Khanh Dam, Truyen Tran, Trang Pham et al.
Code flaws or vulnerabilities are prevalent in software systems and can potentially cause a variety of problems including deadlock, information loss, or system failure. A variety of approaches have been developed to try and detect the most likely locations of such code vulnerabilities in large code bases. Most of them rely on manually designing features (e.g. complexity metrics or frequencies of code tokens) that represent the characteristics of the code. However, all suffer from challenges in sufficiently capturing both semantic and syntactic representation of source code, an important capability for building accurate prediction models. In this paper, we describe a new approach, built upon the powerful deep learning Long Short Term Memory model, to automatically learn both semantic and syntactic features in code. Our evaluation on 18 Android applications demonstrates that the prediction power obtained from our learned features is equal or even superior to what is achieved by state of the art vulnerability prediction models: 3%--58% improvement for within-project prediction and 85% for cross-project prediction.
SEAug 9, 2016
A deep language model for software codeHoa Khanh Dam, Truyen Tran, Trang Pham
Existing language models such as n-grams for software code often fail to capture a long context where dependent code elements scatter far apart. In this paper, we propose a novel approach to build a language model for software code to address this particular issue. Our language model, partly inspired by human memory, is built upon the powerful deep learning-based Long Short Term Memory architecture that is capable of learning long-term dependencies which occur frequently in software code. Results from our intrinsic evaluation on a corpus of Java projects have demonstrated the effectiveness of our language model. This work contributes to realizing our vision for DeepSoft, an end-to-end, generic deep learning-based framework for modeling software and its development process.
SEJul 30, 2016
DeepSoft: A vision for a deep model of softwareHoa Khanh Dam, Truyen Tran, John Grundy et al.
Although software analytics has experienced rapid growth as a research area, it has not yet reached its full potential for wide industrial adoption. Most of the existing work in software analytics still relies heavily on costly manual feature engineering processes, and they mainly address the traditional classification problems, as opposed to predicting future events. We present a vision for \emph{DeepSoft}, an \emph{end-to-end} generic framework for modeling software and its development process to predict future risks and recommend interventions. DeepSoft, partly inspired by human memory, is built upon the powerful deep learning-based Long Short Term Memory architecture that is capable of learning long-term temporal dependencies that occur in software evolution. Such deep learned patterns of software can be used to address a range of challenging problems such as code and task recommendation and prediction. DeepSoft provides a new approach for research into modeling of source code, risk prediction and mitigation, developer modeling, and automatically generating code patches from bug reports.
SEFeb 25, 2014
Towards rational and minimal change propagation in model evolutionHoa Khanh Dam, Aditya Ghose
A critical issue in the evolution of software models is change propagation: given a primary change that is made to a model in order to meet a new or changed requirement, what additional secondary changes are needed to maintain consistency within the model, and between the model and other models in the system? In practice, there are many ways of propagating changes to fix a given inconsistency, and how to justify and automate the selection between such change options remains a critical challenge. In this paper, we propose a number of postulates, inspired by the mature belief revision theory, that a change propagation process should satisfy to be considered rational and minimal. Such postulates enable us to reason about selecting alternative change options, and consequently to develop a machinery that automatically performs this task. We further argue that a possible implementation of such a change propagation process can be considered as a classical state space search in which each state represents a snapshot of the model in the process. This view naturally reflects the cascading nature of change propagation, where each change can require further changes to be made.