Description
The term deepfake describes synthetically generated media content which is often
developed with malicious intent. Audio deepfakes are a sub-genre of deepfakes for
spoofs of audio recordings. Advances in research, computation power and ever-growing
datasets lead to better and more convincing deepfakes. This, consequently, inspires
misuses like fraud or defamation which underlines the importance of countermeasures.
Currently, conducted research focuses almost exclusively on the detection of deepfakes,
i.e. "Is this media content computer generated or is it real?". But, as practiced in all subfields
of cybersecurity, knowing "Who attacked me?" is essential in setting up defenses as well.
This is the reason why this thesis sets out to identify the attacker or the creator of an
audio deepfake. We present both traditional and machine-learning based methods
to create a so-called attacker signature which is unique to an attacker. Those methods
are then evaluated on two large audio deepfake datasets: one where the attackers are
known (ASVspoof19) and one where they aren’t (ASVspoof21). This results in the
observation that a traditional approach is not suitable to create attacker signatures, i.e.
identify the attackers. On the other hand, an embedding based model is able to form
clusters for each attacker even when the audio recordings or the attacker is unknown.
In summary, this means that it is possible to identify who created an audio deepfake.
|