Comparing Privacy-Preserving Regularization Methods for Machine Learning

Supervisor(s):	Mark Gall
Status:	finished
Topic:	Machine Learning Methods
Author:	Florian Koller
Submission:	2025-02-21
Type of Thesis:	Bachelorthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching
Description Classic machine learning algorithms are used to find trends in data. A model can represent this trend as a function and use it to make predictions on new data. Logistic regression is one such model that can be used to build a classification model. If the model performs poorly on new data, regularization can be used to improve its performance. However, all data that is used to train a model is exposed to the model provider. This poses a privacy problem. In this thesis, we show that fully homomorphic encryption (FHE) can be used to address this problem by encrypting all data and model weights. We found that it is possible to perform logistic regression on small datasets using FHE by adapting the logistic regression algorithm. This comes with an overhead in computation time; however, the difference in the model’s accuracy is negligible. This successfully hides the secrets present in the data from the model provider. We show that L2 regularization can be efficiently performed in FHE as well. We found that L1 regularization is possible in FHE, but the computation overhead makes its usage impractical. Our results demonstrate the feasibility of performing logistic regression in a privacy-preserving manner with regularization using the L2 term. They also show the challenges in applying L1 regularization in an FHE manner. We anticipate further research into the efficiency of logistic regression using FHE to support larger datasets, more iterations, and faster training times. Furthermore, we expect more research into making the L1 term practical in FHE and the use of other regularization terms.

Comparing Privacy-Preserving Regularization Methods for Machine Learning

Description