Description
As more and more aspects of our daily lives move online, the security of online data becomes increasingly
important. Injections - i.e., the insertion of unwanted strings - are among the most common threats to online
security. Identifying potential injection vulnerabilities in source code is crucial to preventing this type of attack.
Static analysis tools that try to gain insight into the behavior of programs from their source code and are used
for this detection can benefit from precise information about the values of potentially vulnerable strings. Currently,
most of the available static analysis tools have no way of obtaining such information.
In this thesis, we adapt an approach to extract formal grammars that describe strings from a code property graph.
We then approximate these grammars to be strongly regular and generate human-readable regular expressions from
them. To accomplish this, we transform them into automata and employ the state elimination strategy.
The obtained regular expressions describe properties of the values the analyzed strings can take on and match all such values.
We provide a proof of concept implementation and show that it is able to accurately describe complex strings. We believe that,
with further refinement, the information our approach provides can be used to successfully detect, for example, SQL injection
vulnerabilities. This can increase the capabilities of static analysis tools and helps in preventing security issues.
|