Description
Advancements in cloud computing have fabricated an environment where multiple distributed
systems working in unison is contemporary. Access Control in such a complex problem
space is difficult. With the rise of Policy-as-code, languages like Rego simplify authorization
decisions to one logical document for a centralized evaluator. This thesis builds on recent
advancements in specialized code Large Language Models (LLMs) for the generation of
ready-to-evaluate policy documents from natural language (NL) descriptions of Role Based
Access Control (RBAC) definitions. Unlike previous research, this approach achieves a full
end-to-end solution by treating the output space as a code generation task. I also present a
methodology to transform a dataset of role-action-objects tuples into Rego Code and generate
synthesized users with respective unit tests. After finetuning CodeT5+ on the data, the results
of these unit tests being validated against generated outputs demonstrate a high pass rate,
good semantic understanding and excellent syntactical correctness. The findings demonstrate
that the task of producing policy-as-code from NL, is suitable for transformer based code
LLMs.
|