Description
As deep learning based approaches continue to achieve great results and their use becomes more widespread, it becomes increasingly more important to examine their behavior in adversarial settings. Unfortunately, neural networks are vulnerable to so called adversarial examples, inputs to machine learning systems that are intentionally designed to cause them to malfunction. Despite the ongoing research efforts there is no reliable solution so far, meaning that today’s state of the art deep learning approaches remain vulnerable. Ongoing efforts proposed a number of methods to detect adversarial examples, many of which researchers have shown to be ineffective. In this thesis we examine the limitations of two popular adversarial example detection methods. We find that both methods are effective in a zero-knowledge threat model, achieving >80% ROC-AUC across a number of different dataset-attack configurations.
|