Morakot Choetkiertikul

SE
6papers
235citations
Novelty40%
AI Score45

6 Papers

53.4SEMar 31
When is Generated Code Difficult to Comprehend? Assessing AI Agent Python Code Proficiency in the Wild

Nanthit Temkulkiat, Chaiyong Ragkhitwetsagul, Morakot Choetkiertikul et al.

The rapid adoption of AI coding agents is fundamentally shifting software developers' roles from code authors to code reviewers. While developers spend a significant portion of their time reading and comprehending code, the linguistic proficiency and complexity of the Python code generated by these agents remain largely unexplored. This study investigates the code proficiency of AI agents to determine the skill level required for developers to maintain their code. Leveraging the AIDev dataset, we mined 591 pull requests containing 5,027 Python files generated by three distinct AI agents and employed pycefr, a static analysis tool that maps Python constructs to six proficiency levels, ranging from A1 (Basic) to C2 (Mastery), to analyze the code. Our results reveal that: AI agents predominantly generate Basic-level code, with over 90% of constructs falling into the A1 and A2 categories, and less than 1% classified as Mastery (C2); AI agents' and humans' pull requests share a broadly similar proficiency profile; High-proficiency code by AI agents are from feature addition and bug fixing tasks. These findings suggest that while AI-generated code is generally accessible to developers with basic Python skills, specific tasks may require advanced proficiency to review and maintain complex, agent-generated constructs.

CRDec 28, 2021Code
Mining and Classifying Privacy and Data Protection Requirements in Issue Reports

Pattaraporn Sangaroonsilp, Hoa Khanh Dam, Morakot Choetkiertikul et al.

Digital and physical footprints are a trail of user activities collected over the use of software applications and systems. As software becomes ubiquitous, protecting user privacy has become challenging. With the increase of user privacy awareness and advent of privacy regulations and policies, there is an emerging need to implement software systems that enhance the protection of personal data processing. However, existing data protection and privacy regulations provide key principles in high-level, making it difficult for software engineers to design and implement privacy-aware systems. In this paper, we develop a taxonomy that provides a comprehensive set of privacy requirements based on four well-established personal data protection regulations and privacy frameworks, the General Data Protection Regulation (GDPR), ISO/IEC 29100, Thailand Personal Data Protection Act (Thailand PDPA) and Asia-Pacific Economic Cooperation (APEC) privacy framework. These requirements are extracted, refined and classified into a level that can be used to map with issue reports. We have also performed a study on how two large open-source software projects (Google Chrome and Moodle) address the privacy requirements in our taxonomy through mining their issue reports. The paper discusses how the collected issues were classified, and presents the findings and insights generated from our study. Mining and classifying privacy requirements in issue reports can help organisations be aware of their state of compliance by identifying privacy requirements that have not been addressed in their software projects. The taxonomy can also trace back to regulations, standards and frameworks that the software projects have not complied with based on the identified privacy requirements.

SEJan 5, 2021Code
A Taxonomy for Mining and Classifying Privacy Requirements in Issue Reports

Pattaraporn Sangaroonsilp, Hoa Khanh Dam, Morakot Choetkiertikul et al.

Context: Digital and physical trails of user activities are collected over the use of software applications and systems. As software becomes ubiquitous, protecting user privacy has become challenging. With the increase of user privacy awareness and advent of privacy regulations and policies, there is an emerging need to implement software systems that enhance the protection of personal data processing. However, existing data protection and privacy regulations provide key principles in high-level, making it difficult for software engineers to design and implement privacy-aware systems. Objective: In this paper, we develop a taxonomy that provides a comprehensive set of privacy requirements based on four well-established personal data protection regulations and privacy frameworks, the General Data Protection Regulation (GDPR), ISO/IEC 29100, Thailand Personal Data Protection Act (Thailand PDPA) and Asia-Pacific Economic Cooperation (APEC) privacy framework. Methods: These requirements are extracted, refined and classified (using the goal-based requirements analysis method) into a level that can be used to map with issue reports. We have also performed a study on how two large open-source software projects (Google Chrome and Moodle) address the privacy requirements in our taxonomy through mining their issue reports. Results: The paper discusses how the collected issues were classified, and presents the findings and insights generated from our study. Conclusion: Mining and classifying privacy requirements in issue reports can help organisations be aware of their state of compliance by identifying privacy requirements that have not been addressed in their software projects. The taxonomy can also trace back to regulations, standards and frameworks that the software projects have not complied with based on the identified privacy requirements.

SESep 5, 2020Code
Teddy: Automatic Recommendation of Pythonic Idiom Usage For Pull-Based Software Projects

Purit Phan-udom, Naruedon Wattanakul, Tattiya Sakulniwat et al.

Pythonic code is idiomatic code that follows guiding principles and practices within the Python community. Offering performance and readability benefits, Pythonic code is claimed to be widely adopted by experienced Python developers, but can be a learning curve to novice programmers. To aid with Pythonic learning, we create an automated tool, called Teddy, that can help checking the Pythonic idiom usage. The tool offers a prevention mode with Just-In-Time analysis to recommend the use of Pythonic idiom during code review and a detection mode with historical analysis to run a thorough scan of idiomatic and non-idiomatic code. In this paper, we first describe our tool and an evaluation of its performance. Furthermore, we present a case study that demonstrates how to use Teddy in a real-life scenario on an Open Source project. An evaluation shows that Teddy has high precision for detecting Pythonic idiom and non-Pythonic code. Using interactive visualizations, we demonstrate how novice programmers can navigate and identify Pythonic idiom and non-Pythonic code in their projects. Our video demo with the full interactive visualizations is available at https://youtu.be/vOCQReSvBxA.

SESep 2, 2016Code
A deep learning model for estimating story points

Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran et al.

Although there has been substantial research in software analytics for effort estimation in traditional software projects, little work has been done for estimation in agile projects, especially estimating user stories or issues. Story points are the most common unit of measure used for estimating the effort involved in implementing a user story or resolving an issue. In this paper, we offer for the \emph{first} time a comprehensive dataset for story points-based estimation that contains 23,313 issues from 16 open source projects. We also propose a prediction model for estimating story points based on a novel combination of two powerful deep learning architectures: long short-term memory and recurrent highway network. Our prediction system is \emph{end-to-end} trainable from raw input data to prediction outcomes without any manual feature engineering. An empirical evaluation demonstrates that our approach consistently outperforms three common effort estimation baselines and two alternatives in both Mean Absolute Error and the Standardized Accuracy.

SESep 7, 2021
FixMe: A GitHub Bot for Detecting and Monitoring On-Hold Self-Admitted Technical Debt

Saranphon Phaithoon, Supakarn Wongnil, Patiphol Pussawong et al.

Self-Admitted Technical Debt (SATD) is a special form of technical debt in which developers intentionally record their hacks in the code by adding comments for attention. Here, we focus on issue-related "On-hold SATD", where developers suspend proper implementation due to issues reported inside or outside the project. When the referenced issues are resolved, the On-hold SATD also need to be addressed, but since monitoring these issue reports takes a lot of time and effort, developers may not be aware of the resolved issues and leave the On-hold SATD in the code. In this paper, we propose FixMe, a GitHub bot that helps developers detecting and monitoring On-hold SATD in their repositories and notify them whenever the On-hold SATDs are ready to be fixed (i.e. the referenced issues are resolved). The bot can automatically detect On-hold SATD comments from source code using machine learning techniques and discover referenced issues. When the referenced issues are resolved, developers will be notified by FixMe bot. The evaluation conducted with 11 participants shows that our FixMe bot can support them in dealing with On-hold SATD. FixMe is available at https://www.fixmebot.app/ and FixMe's VDO is at https://youtu.be/YSz9kFxN_YQ.