Description
Several studies showed that cryptographic vulnerabilities are prevalent in real-world applications. They are mainly introduced by incorrect or insecure usages of cryptographic APIs, also referred to as crypto API misuses. Various crypto API misuse detectors were developed that enable developers to scan projects for such misuses to fix them. State-of-the-art detectors either manually specify crypto API usage rules or train machine learning models on a labeled dataset. Thus, they require a significant amount of time and expert knowledge to support a new crypto API, especially if the crypto API belongs to another programming language.
This thesis aims to develop a deep learning-based crypto API misuse detector that can be easily trained on an automatically created dataset to support new crypto APIs for multiple programming languages. To this end, we trained an Autoencoder (AE) on an unlabeled dataset containing crypto API usages represented as CPGs. To heuristically filter the dataset for secure crypto API misuses, we proposed to apply code quality metrics that correlate with vulnerabilities. We extracted CPGs based on API constraints using an existing library that supports multiple programming languages.
Our AE consists of an R-GCN encoder and an Inner Product decoder. The encoder maps input data to a compressed representation which is used for reconstruction by the decoder. For the detection of crypto API misuses, the AE uses an anomaly detection based approach because it is trained to reconstruct frequently encountered patterns in the training dataset as closely as possible. Thus, if the AE fails to properly reconstruct a crypto API usage, it is considered a misuse.
Our results showed that the implemented AE is not able to detect crypto API misuses. We identified that the decoder is inadequate as the main reason. The used decoder model is designed for undirected, non-relational graphs and is therefore not able to reconstruct a CPG. Thus, the next step is to design a decoder compatible with directed, relational graphs and to reevaluate the performance of an AE model for crypto API detection.
|