Description
In recent years, the usage of face and voice recognition systems has increased greatly. With this increase comes a growth in attacks on such systems, in particular adversarial examples, and, hence, the need to defend against them. This thesis investigates two approaches to detect adversarial examples in an unsupervised setting. Both methods are independent of the given data type. On the basis of an existing supervised two-part framework, we create an unsupervised two-part architecture. Depending on the approach, one or multiple autoencoders are trained to detect adversarial examples based on the hidden activations of a second neural network. We evaluate the approaches in terms of detection rate and show that we can successfully detect adversarial examples generated by different methods. Furthermore, we discuss why different adversarial example attack methods should be detected using differently configured autoencoders.
|