Code Virtualization for Software Protection: Analysis and Countermeasures
Code Virtualization for Software Protection: Analysis and Countermeasures
Supervisor(s): | Julian Kirsch Clemens Jonischkeit |
Status: | finished |
Topic: | Others |
Author: | Ludwig Peuckert |
Submission: | 2020-03-16 |
Type of Thesis: | Masterthesis |
DescriptionCode-Virtualization for Software Protection: Analysis and Countermeasures March 16, 2020 The number of newly detected malware every year is immense. In particular, industry and mobile devices are targets of interest. To keep their malicious functionality under the radar of anti virus software, malware developers obfuscate critical code sections. In the recent years a technique called code virtualization gained popularity. The often short malicious code is translated into a different instruction sect architecture. At runtime the virtual code is interpreted by a virtual machine attached to the original program. Code virtualization is a challenging obfuscation and few approaches for re- verse engineering exist. The translation to an unknown instruction sect archi- tecture is effectively close to an encryption. Apart from that, interpretation introduces a large overhead. The combination of those facts renders many de- obfuscation approaches ineffective. Approaches like symbolic execution and taint analysis heavily rely on input size. Therefore, detection of virtualization boundaries is a key point during deobfuscation. Many approaches assume the virtual boundaries as known be- forehand. However, this is a non trivial task. In this work, we present a new approach to identify virtualized sections. We exploit characteristic patterns in memory access, register usage, and jump structure. However, different virtual machines show different patterns, render- ing pattern matching with known patterns ineffective. To overcome this, we introduce a modified version of autocorrelation, capable of detecting recurring patterns of any shape. With the boundaries detected, we identify the virtual program counter to filter false positives and fulfill the assumption of other approaches. We implement our approach as an overestimation to minimize false negatives. We test our approach on unseen virtual machines and are able to detect most of their virtualized sections. An evaluation on false positives shows, that the original trace can be reduced by up to 99% in many cases. |