"You still have to study" -- On the Security of LLM generated code
This addresses security risks in AI-generated code for programmers and educators, though it is incremental as it builds on existing prompting techniques.
The paper analyzed the security of code generated by four major LLMs for Python and JavaScript, finding that initial prompts led to 65% insecure code, but with skilled manual guidance, nearly 100% secure code could be achieved.
We witness an increasing usage of AI-assistants even for routine (classroom) programming tasks. However, the code generated on basis of a so called "prompt" by the programmer does not always meet accepted security standards. On the one hand, this may be due to lack of best-practice examples in the training data. On the other hand, the actual quality of the programmers prompt appears to influence whether generated code contains weaknesses or not. In this paper we analyse 4 major LLMs with respect to the security of generated code. We do this on basis of a case study for the Python and Javascript language, using the MITRE CWE catalogue as the guiding security definition. Our results show that using different prompting techniques, some LLMs initially generate 65% code which is deemed insecure by a trained security engineer. On the other hand almost all analysed LLMs will eventually generate code being close to 100% secure with increasing manual guidance of a skilled engineer.