Exploring Improvements to Type-based Static Analysis in the Linux Kernel
Exploring Improvements to Type-based Static Analysis in the Linux Kernel
Supervisor(s): | Marius Momeu |
Status: | finished |
Topic: | Others |
Author: | Kai Angnis |
Submission: | 2024-10-24 |
Type of Thesis: | Bachelorthesis |
DescriptionGenerating precise call graphs is relevant for many use-cases in IT security, like static bug detection, fuzzing or enforcing control flow integrity, however existing solutions lead to overapproximated results due to the presence of function pointers. Multi Layer Type Analysis (MLTA) is a state of the art approach that improves the precision of generating call graphs by analysing the relationship between the datatypes used in the source code. We assessed using MLTA to generate call graphs of Linux Kernel functions and we determined that they still contain a large number of functions from unrelated parts of the kernel thus making them ineffective for said security use-cases. This work aims to improve the precision of MLTA, when we evaluate it on the Linux kernel. We first analyse the implementation of MLTA and find that it does not perform its analysis across function boundaries. Therefor we introduce Inter-Procedural Multi Layer Type Analysis (IMLTA), which is capable of continuing its analysis across function boundaries. Second we analyse the indirect calls in the Linux kernel, for which IMLTA reports a high number of targets and develop mechanisms that can further improve its precision. Third we introduce a new metric to better quantify the precision of the resulting call graph by counting the reachable functions from every graph node. Using the new metric, we determine that IMLTA is able to reduce the average size of the call graphs by 93% compared to the reference implementation. However it still contains functions that can still reach a high number of other functions. Therefore we evaluate the edges in the call graph to further understand whats causing this. We observe that some of the important functions in the graph contain indirect calls with an overapproximated callee set. |