Smart Mutations for Library Fuzzing
Smart Mutations for Library Fuzzing
Supervisor(s): | Fabian Kilger |
Status: | finished |
Topic: | Others |
Author: | Alexandru Sasu |
Submission: | 2023-04-17 |
Type of Thesis: | Masterthesis |
DescriptionLibrary fuzzing is crucial for software security, as a vulnerable code library compromises all the projects that use it. Unfortunately, the process of fuzzing spends a lot of time on unpromising inputs and the specific use of mutations in the context of library fuzzing has been overlooked by the research community. In this thesis, we propose a novel system that utilizes neural networks to filter mutations during the fuzzing process to enhance the speed of new path discovery. New path discovery is an important part of mutation-based fuzzing, and the way mutations are selected heavily impacts its speed and efficacy. While the current methods for creating and choosing mutations show good results, we suggest that the speed of path discovery can be increased through filtering. We are training a neural network to predict which mutations will not be useful in the long term, so we can discard them. We developed two different approaches for our design, one in which the model returns a heatmap of mutation relevance for every seed that can be compared with mutations, and one in which the model predicts how many mutations that increase coverage will be generated if we save the current seed. We also suggested multiple neural network architectures including long short-term memory and convolutional neural networks. We also modified RULF, a fuzz-target generator for Rust libraries, to change the structure of the fuzz-target input, so that it is more structured which would make it easier for the neural networks to learn. Furthermore, we modified AFL++ to query our neural networks before saving a seed, to decide if it should do so. We evaluated both versions of our proposed system by running the modified fuzzers and the unmodified fuzzer on a set of five Rust libraries and compared the results. We showed that while both approaches fall short for three out of the five targets, and for one of the targets the results are inconclusive, new path discovery speed was increased for the library url in both approaches. From the total number of approximately 6800 edges found after 4 hours of testing with all 3 fuzzers, on average, the unmodified fuzzer reached 6700 after 53 minutes, the bitmask fuzzer after 31 minutes, and the coverage prediction fuzzer after 27 minutes. This shows that the performance of the system is highly dependent on factors such as the target library and its structure. Through our results, we highlight the potential for improving the efficiency of library fuzzing through the proposed system. |