TUM Logo

Exploring Improvements to Type-based Static Analysis in the Linux Kernel

Exploring Improvements to Type-based Static Analysis in the Linux Kernel

Supervisor(s): Marius Momeu
Status: finished
Topic: Others
Author: Kai Angnis
Submission: 2024-10-24
Type of Thesis: Bachelorthesis

Description

Generating precise call graphs is relevant for many use-cases in IT security, like static
bug detection, fuzzing or enforcing control flow integrity, however existing solutions
lead to overapproximated results due to the presence of function pointers.
Multi Layer Type Analysis (MLTA) is a state of the art approach that improves the
precision of generating call graphs by analysing the relationship between the datatypes
used in the source code. We assessed using MLTA to generate call graphs of Linux Kernel
functions and we determined that they still contain a large number of functions from
unrelated parts of the kernel thus making them ineffective for said security use-cases.
This work aims to improve the precision of MLTA, when we evaluate it on the Linux
kernel. We first analyse the implementation of MLTA and find that it does not perform
its analysis across function boundaries. Therefor we introduce Inter-Procedural Multi
Layer Type Analysis (IMLTA), which is capable of continuing its analysis across function
boundaries. Second we analyse the indirect calls in the Linux kernel, for which IMLTA
reports a high number of targets and develop mechanisms that can further improve
its precision. Third we introduce a new metric to better quantify the precision of the
resulting call graph by counting the reachable functions from every graph node. Using
the new metric, we determine that IMLTA is able to reduce the average size of the call
graphs by 93% compared to the reference implementation. However it still contains
functions that can still reach a high number of other functions. Therefore we evaluate the
edges in the call graph to further understand whats causing this. We observe that some
of the important functions in the graph contain indirect calls with an overapproximated
callee set.