Applying NLP for AC policy generation from responsibility definitions

Supervisor(s):	Christian Banse
Status:	finished
Topic:	Others
Author:	Marc Ziegler
Submission:	2024-07-15
Type of Thesis:	Bachelorthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching
Description Advancements in cloud computing have fabricated an environment where multiple distributed systems working in unison is contemporary. Access Control in such a complex problem space is difficult. With the rise of Policy-as-code, languages like Rego simplify authorization decisions to one logical document for a centralized evaluator. This thesis builds on recent advancements in specialized code Large Language Models (LLMs) for the generation of ready-to-evaluate policy documents from natural language (NL) descriptions of Role Based Access Control (RBAC) definitions. Unlike previous research, this approach achieves a full end-to-end solution by treating the output space as a code generation task. I also present a methodology to transform a dataset of role-action-objects tuples into Rego Code and generate synthesized users with respective unit tests. After finetuning CodeT5+ on the data, the results of these unit tests being validated against generated outputs demonstrate a high pass rate, good semantic understanding and excellent syntactical correctness. The findings demonstrate that the task of producing policy-as-code from NL, is suitable for transformer based code LLMs.

Description