Semantics-Aware Neural Network-Based Malware Classificaton Model
Semantics-Aware Neural Network-Based Malware Classificaton Model
Supervisor(s): | Bojan Kolosnjaji |
Status: | finished |
Topic: | Machine Learning Methods |
Author: | Gehrig Matthias |
Submission: | 2018-09-17 |
Type of Thesis: | Masterthesis |
DescriptionThe absolute number of malware per year grows exponentially. Traditional antimalware tools have a hard time to keep up with this rapid development since they are mainly based on detection via signatures. Therefore the industry desires automatic systems capable of large-scale malware analysis. In this thesis, the author proposes a classification approach based on convolutional neural networks combined with semantic features. Its purpose is the classification of malware variants into their respective families. The ground truth of the malware families is generated via unsupervised clustering. For this purpose, VirusTotal results are used. The presented framework mainly utilizes static features extracted from malware binaries. Those features include common ones like bag of words and n-grams among others. These features are mainly engineered from so-called gadgets. Gadgets are short blocks of assembly code instruction sequences. The author furthermore shows that the addition of semantic-based features derived from control flow graphs improves the neural network classification results with regard to chosen metrics. The classification performance of the proposed approach is compared to popular machine learning reference algorithms such as Gradient Boosting Trees and Support Vector Machines. |