Semantics-Aware Neural Network-Based Malware Classificaton Model

Supervisor(s):	Bojan Kolosnjaji
Status:	finished
Topic:	Machine Learning Methods
Author:	Gehrig Matthias
Submission:	2018-09-17
Type of Thesis:	Masterthesis
Description The absolute number of malware per year grows exponentially. Traditional antimalware tools have a hard time to keep up with this rapid development since they are mainly based on detection via signatures. Therefore the industry desires automatic systems capable of large-scale malware analysis. In this thesis, the author proposes a classification approach based on convolutional neural networks combined with semantic features. Its purpose is the classification of malware variants into their respective families. The ground truth of the malware families is generated via unsupervised clustering. For this purpose, VirusTotal results are used. The presented framework mainly utilizes static features extracted from malware binaries. Those features include common ones like bag of words and n-grams among others. These features are mainly engineered from so-called gadgets. Gadgets are short blocks of assembly code instruction sequences. The author furthermore shows that the addition of semantic-based features derived from control flow graphs improves the neural network classification results with regard to chosen metrics. The classification performance of the proposed approach is compared to popular machine learning reference algorithms such as Gradient Boosting Trees and Support Vector Machines.

Description