Description
Traditional and deep learning methods have been widely used for detecting software vulnerabilities, but they often struggle to provide consistent and reliable automated solutions. Recent advancements in large language models (LLMs) have demonstrated remarkable capabilities in understanding complex patterns within both natural language and code. This thesis explores the potential of fine-tuning state-of-the-art open-source LLMs for the specific task of vulnerability detection. A major challenge in training models for this purpose is the limited availability of high-quality, large-scale datasets. To address this, we explore the construction of an extensive and well-curated dataset by combining existing real-world and synthetic datasets. Through careful selection, preprocessing, merging, and additional cleaning, we created a dataset designed to better support effective training and evaluation of LLMs for vulnerability detection. We evaluate the effectiveness of fine-tuned LLMs in detecting vulnerabilities in software code, comparing models of different sizes and architectures. Furthermore, we analyze the impact of dataset composition, examining how class balance and data complexity affect model performance. Our results indicate that while fine-tuned LLMs can learn certain patterns and identify some vulnerabilities with high confidence, distinguishing more complex cases remains a challenge, particularly when subtle code modifications are crucial for identifying vulnerabilities. Additionally, we observe that model size alone is not a determining factor for improved performance in vulnerability detection, highlighting the importance of dataset size and quality for task-specific fine-tuning.
|