CRJun 1, 2021Code
On using distributed representations of source code for the detection of C security vulnerabilitiesDavid Coimbra, Sofia Reis, Rui Abreu et al.
This paper presents an evaluation of the code representation model Code2vec when trained on the task of detecting security vulnerabilities in C source code. We leverage the open-source library astminer to extract path-contexts from the abstract syntax trees of a corpus of labeled C functions. Code2vec is trained on the resulting path-contexts with the task of classifying a function as vulnerable or non-vulnerable. Using the CodeXGLUE benchmark, we show that the accuracy of Code2vec for this task is comparable to simple transformer-based methods such as pre-trained RoBERTa, and outperforms more naive NLP-based methods. We achieved an accuracy of 61.43% while maintaining low computational requirements relative to larger models.
CYMay 22, 2018
Are Computer Science and Engineering Graduates Ready for the Software Industry? Experiences from an Industrial Student Training ProgramEray Tuzun, Hakan Erdogmus, Izzet Gokhan Ozbilgin
It has been 50 years since the term software engineering was coined in 1968 at a NATO conference. The field should be relatively mature by now, with most established universities covering core software engineering topics in their Computer Science programs and others offering specialized degrees. However, still many practitioners lament a lack of skills in new software engineering hires. With the growing demand for software engineers from the industry, this apparent gap becomes more and more pronounced. One corporate strategy to address this gap is for the industry to develop supplementary training programs before the hiring process, which could also help them screen viable candidates. In this paper, we report on our experiences and lessons learned in conducting a summer school program aimed at screening new graduates, introducing them to core skills relevant to the organization and the industry, and assessing their attitudes toward mastering those skills before the hiring process begins. Our experience suggests that such initiatives can be mutually beneficial for new hires and companies. We support this insight with pre- and post-training data collected from the participants during the first edition of such a summer school and a follow-up questionnaire conducted after a year with the graduates, 50% of whom was hired by the company shortly after the summer school.
SEFeb 23, 2017
Flipping a Graduate-Level Software Engineering Foundations CourseHakan Erdogmus, Cecile Peraire
Creating a graduate-level software engineering breadth course is challenging. The scope is wide. Students prefer hands-on work over theory. Industry increasingly values soft skills. Changing software technology requires the syllabus to be technology-agnostic, yet abstracting away technology compromises realism. Instructors must balance scope with depth of learning. At Carnegie Mellon University, we designed a flipped-classroom course that tackles these tradeoffs. The course has been offered since Fall 2014 in the Silicon Valley campus. In this paper, we describe the course's key features and summarize our experiences and lessons learned while designing, teaching, and maintaining it. We found that the pure flipped-classroom format was not optimal in ensuring sufficient transfer of knowledge, especially in remote settings. We initially underestimated teaching assistantship resources. We gradually complemented video lectures and hands-on live sessions with additional live components: easily replaceable recitations that focus on current technology and mini lectures that address application of theory and common wisdom. We also provided the students with more opportunities to share their successes and experiments with their peers. We achieved scalability by increasing the number of teaching assistants, paying attention to teaching assistant recruitment, and fostering a culture of mentoring among the teaching team.
SENov 18, 2016
A Dissection of the Test-Driven Development Process: Does It Really Matter to Test-First or to Test-Last?Davide Fucci, Hakan Erdogmus, Burak Turhan et al.
Background: Test-driven development (TDD) is a technique that repeats short coding cycles interleaved with testing. The developer first writes a unit test for the desired functionality, followed by the necessary production code, and refactors the code. Many empirical studies neglect unique process characteristics related to TDD iterative nature. Aim: We formulate four process characteristic: sequencing, granularity, uniformity, and refactoring effort. We investigate how these characteristics impact quality and productivity in TDD and related variations. Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness. Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow.