TUM Logo

HawkEye: Cross-Platform Malware Detection with Representation Learning on Graphs

Malicious software, widely-known as malware, is one of the biggest threats to our interconnected society. Cyber-criminals can utilize malware to carry out their nefarious tasks. To address this issue, analysts have developed systems that can prevent malware from successfully infecting a machine. Unfortunately, these systems come with two significant limitations. First, they frequently target one specific platform/architecture, and thus, they cannot be ubiquitous. Second, code obfuscation techniques used by malware authors can negatively influence their performance. In this paper, we design and implement HawkEye, a control-flow-graph-based cross-platform malware detection system, to tackle the problems mentioned above. In more detail, HawkEye utilizes a graph neural network to convert the control flow graphs of executable to vectors with the trainable instruction embedding and then uses a machine-learning-based classifier to create a malware detection system. We evaluate HawkEye by testing real samples on different platforms and operating systems, including Linux (x86, x64, and ARM-32), Windows (x86 and x64), and Android. The results outperform most of the existing works with an accuracy of 96.82% on Linux, 93.39% on Windows, and 99.6% on Android. To the best of our knowledge, HawkEye is the first approach to consider graph neural networks in the malware detection field, utilizing natural language processing.

HawkEye: Cross-Platform Malware Detection with Representation Learning on Graphs

Authors: Peng Xu, Youyi Zhang, Claudia Eckert, and Apostolis Zarras
Year/month: 2021/
Booktitle: ICANN - The International Conference on Artificial Neural Networks
Fulltext: HawkEye.pdf

Abstract

Malicious software, widely-known as malware, is one of the biggest threats to our interconnected society. Cyber-criminals can utilize malware to carry out their nefarious tasks. To address this issue, analysts have developed systems that can prevent malware from successfully infecting a machine. Unfortunately, these systems come with two significant limitations. First, they frequently target one specific platform/architecture, and thus, they cannot be ubiquitous. Second, code obfuscation techniques used by malware authors can negatively influence their performance. In this paper, we design and implement HawkEye, a control-flow-graph-based cross-platform malware detection system, to tackle the problems mentioned above. In more detail, HawkEye utilizes a graph neural network to convert the control flow graphs of executable to vectors with the trainable instruction embedding and then uses a machine-learning-based classifier to create a malware detection system. We evaluate HawkEye by testing real samples on different platforms and operating systems, including Linux (x86, x64, and ARM-32), Windows (x86 and x64), and Android. The results outperform most of the existing works with an accuracy of 96.82% on Linux, 93.39% on Windows, and 99.6% on Android. To the best of our knowledge, HawkEye is the first approach to consider graph neural networks in the malware detection field, utilizing natural language processing.

Bibtex:

@conference {
author = { Peng Xu and Youyi Zhang and Claudia Eckert and Apostolis Zarras },
title = { HawkEye: Cross-Platform Malware Detection with Representation Learning on Graphs },
year = { 2021 },
booktitle = { ICANN - The International Conference on Artificial Neural Networks },
url = {https://www.sec.in.tum.de/i20/publications/hawkeye-cross-platform-malware-detection-with-representation-learning-on-graphs/@@download/file/HawkEye.pdf}
}