TUM Logo

Comparing Privacy-Preserving Regularization Methods for Machine Learning

Comparing Privacy-Preserving Regularization Methods for Machine Learning

Supervisor(s): Mark Gall
Status: finished
Topic: Machine Learning Methods
Author: Florian Koller
Submission: 2025-02-21
Type of Thesis: Bachelorthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

Classic machine learning algorithms are used to find trends in data. A model can
represent this trend as a function and use it to make predictions on new data. Logistic
regression is one such model that can be used to build a classification model. If
the model performs poorly on new data, regularization can be used to improve its
performance. However, all data that is used to train a model is exposed to the model
provider. This poses a privacy problem. In this thesis, we show that fully homomorphic
encryption (FHE) can be used to address this problem by encrypting all data and model
weights. We found that it is possible to perform logistic regression on small datasets
using FHE by adapting the logistic regression algorithm. This comes with an overhead
in computation time; however, the difference in the model’s accuracy is negligible.
This successfully hides the secrets present in the data from the model provider. We
show that L2 regularization can be efficiently performed in FHE as well. We found that
L1 regularization is possible in FHE, but the computation overhead makes its usage
impractical. Our results demonstrate the feasibility of performing logistic regression in
a privacy-preserving manner with regularization using the L2 term. They also show
the challenges in applying L1 regularization in an FHE manner. We anticipate further
research into the efficiency of logistic regression using FHE to support larger datasets,
more iterations, and faster training times. Furthermore, we expect more research into
making the L1 term practical in FHE and the use of other regularization terms.