Deep Reinforcement Fuzzing
This addresses security testing for software developers, but it appears incremental as it applies existing deep Q-learning to fuzzing.
The authors tackled the problem of finding security vulnerabilities in code by formalizing fuzzing as a reinforcement learning problem, and preliminary results show that this approach can outperform random fuzzing.
Fuzzing is the process of finding security vulnerabilities in input-processing code by repeatedly testing the code with modified inputs. In this paper, we formalize fuzzing as a reinforcement learning problem using the concept of Markov decision processes. This in turn allows us to apply state-of-the-art deep Q-learning algorithms that optimize rewards, which we define from runtime properties of the program under test. By observing the rewards caused by mutating with a specific set of actions performed on an initial program input, the fuzzing agent learns a policy that can next generate new higher-reward inputs. We have implemented this new approach, and preliminary empirical evidence shows that reinforcement fuzzing can outperform baseline random fuzzing.