Description
Announcement: Thesis (MA/BA) project in cooperation with Fraunhofer AISEC
Generate dataset for software vulnerability detection with LLMs
This project can be realized as a master or bachelor thesis
Motivation and Task Description For benchmarking and machine learning tasks, a high-quality dataset is essential. Existing datasets are either synthetic and lack real-world applicability, or they contain real-world code but have poor label quality and low sample quantity. To combine both methods, previous research has injected syn- thetic vulnerabilities into real-world code, but these approaches are incomplete and do not guarantee a valid execution path. In this project, we aim to use LLMs to embed realistic vulnerabilities into real- world code. The validity of the vulnerability should be demonstrated by also generating a set of input parameters to trigger the vulnerability. As a result, a real-world dataset with known vulnerabilities and a corresponding input with a valid execution path should be created.
Requirements • Programming skills: C/C++ and Python • Knowledge about IT-Security and software vulnerabilities • Knowledge and interest in machine learning, especially with LLMs • Practical experience in debugging C/C++ applications (e.g. gdb) • High amount of self motivation and independent work
Contact Hannah Schmid, Tobias Specht
Telefon: +49 89 322-9986-130, Telefon: +49 89 322-9986-187
E-Mail: hannah.schmid@aisec.fraunhofer.de
E-Mail: tobias.specht@aisec.fraunhofer.de
Fraunhofer Research Institution for Applied and Integrated Security (AISEC) Product Protection and Industrial Security Lichtenbergstraße 11, 85748 Garching (near Munich), Germany
https://www.aisec.fraunhofer.de
|