Description
In the last few decades the number of machine learning applications has been steadily increasing. Although their structures and specific purpose vary throughout a wide range, they all have in common that they require a massive amount of input data to learn from. This development along with commercially available ”machine learning as a service” providers such as Google or Amazon require a rise of awareness of data privacy in the context of machine learning. On the one hand, distributed learning techniques, such as federated learning, promise to improve privacy by keeping training data at the sources rather than uploading it to a central (potentially untrusted) service. On the other hand, existing works have already shown different ways of attacking the privacy of ML models, e.g. regarding membership or feature inference. Privacy, however, is not a precise concept, and various goals and metrics have evolved to quantify it in different domains. This thesis addresses the issue of handling privacy within machine learning by formulating a guideline that serves as support during the selection process of fitting privacy metrics for a machine learning application. This is achieved by first examining the applicability of various known privacy metrics like k-anonymity, differential privacy or adversary’s success probability in the context of machine learning. The recommendations for the use of metrics depending on properties of the respective machine learning application are formulated subsequently. In order to evaluate the applicability of the guideline in combination with a distributed learning setup, a neural network classifying the items of the UCI adults dataset is implemented and examined according to the guideline.
|