TUM Logo

Applying NLP for AC policy generation from responsibility definitions

Applying NLP for AC policy generation from responsibility definitions

Supervisor(s): Christian Banse
Status: finished
Topic: Others
Author: Marc Ziegler
Submission: 2024-07-15
Type of Thesis: Bachelorthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

Advancements in cloud computing have fabricated an environment where multiple distributed
systems working in unison is contemporary. Access Control in such a complex problem
space is difficult. With the rise of Policy-as-code, languages like Rego simplify authorization
decisions to one logical document for a centralized evaluator. This thesis builds on recent
advancements in specialized code Large Language Models (LLMs) for the generation of
ready-to-evaluate policy documents from natural language (NL) descriptions of Role Based
Access Control (RBAC) definitions. Unlike previous research, this approach achieves a full
end-to-end solution by treating the output space as a code generation task. I also present a
methodology to transform a dataset of role-action-objects tuples into Rego Code and generate
synthesized users with respective unit tests. After finetuning CodeT5+ on the data, the results
of these unit tests being validated against generated outputs demonstrate a high pass rate,
good semantic understanding and excellent syntactical correctness. The findings demonstrate
that the task of producing policy-as-code from NL, is suitable for transformer based code
LLMs.