Description
As Machine Learning (ML) undergoes rapid evolution, it becomes the backbone of numer- ous application domains, including medicine, finance, and even safety-critical systems. Since the underlying ML models are crucial for the correct execution of the systems they drive, the relevance of their security escalates exponentially. The challenge of ML security becomes increasingly more complex, considering the inherent vulnerable nature of the models themselves. Hence, an adversary may exploit either the model directly, through techniques such as Adversarial Machine Learning (AML), or the target infrastructure hosting it, in order to gain unauthorized access. In this work, we concentrate on the latter and explore the efficient usage of process-based Trusted Execution Environments (TEEs) to facilitate trusted model inference. While previous efforts have attempted to tackle this problem, they were hindered by shortcomings such the lack of enclave GPU-support, numerically unstable algorithms, or poor resource management. These issues prevented achieving the necessary performance to deploy real-world models, being constrained by the TEE’s memory and computational limitations. Few solutions tried incorporating the untrusted GPU into the execution pipeline of the TEE, but often resulted in considerable precision loss or runtime slowdown. Here, we present GHOSTEE, a framework for GPU-enabled trusted model inference, designed to isolate groups of layers and their weights. As the developer is tasked with choosing which layers to encrypt, we study the definition of potential encryption guidelines within the context of Convolutional Neural Networks (CNNs). Our investiga- tion is based on ResNet and VGG models, as well as on a VGG-11-inspired autoencoder, all of which showcase similar trends regarding the most optimal configurations. To facilitate GPU support for intra-enclave operations, we define a novel memory-efficient matrix masking algorithm for matrix multiplications. After formally proving its numer- ical stability, we compare it to related approaches. The findings reveal that our method is the most numerically stable solution which does not require materialization of additional matrices. We further benchmark the performance of our framework with and without the algorithm, observing faster execution speeds for deep 2D convolutions and fully connected layers with large input batches. Considering the features of GHOSTEE, we edge ever closer to implementing model-agnostic inference pipelines based on TEEs that are applicable in real-world scenarios.
|