TUM Logo

Complex valued neural networks for antispoofing

Current anti-spoofing and audio deepfake detection systems use either magnitude spectrogram-based features (such as CQT or Melspectrograms) or raw audio processed through convolution or sinc-layers. Both methods have drawbacks: magnitude spectrograms discard phase information, which affects audio naturalness, and raw-feature-based models cannot use traditional explainable AI methods. This paper proposes a new approach that combines the benefits of both methods by using complex-valued neural networks to process the complex-valued, CQT frequency-domain representation of the input audio. This method retains phase information and allows for explainable AI methods. Results show that this approach outperforms previous methods on the "In-the-Wild" anti-spoofing dataset and enables interpretation of the results through explainable AI. Ablation studies confirm that the model has learned to use phase information to detect voice spoofing.

Complex valued neural networks for antispoofing

Interspeech 2023

Authors: Nicolas M. Mueller, Philip Sperl,, Philip Sperl, and Konstantin Boettinger
Year/month: 2023/
Booktitle: Interspeech 2023
Fulltext: click here

Abstract

Current anti-spoofing and audio deepfake detection systems use either magnitude spectrogram-based features (such as CQT or Melspectrograms) or raw audio processed through convolution or sinc-layers. Both methods have drawbacks: magnitude spectrograms discard phase information, which affects audio naturalness, and raw-feature-based models cannot use traditional explainable AI methods. This paper proposes a new approach that combines the benefits of both methods by using complex-valued neural networks to process the complex-valued, CQT frequency-domain representation of the input audio. This method retains phase information and allows for explainable AI methods. Results show that this approach outperforms previous methods on the "In-the-Wild" anti-spoofing dataset and enables interpretation of the results through explainable AI. Ablation studies confirm that the model has learned to use phase information to detect voice spoofing.

Bibtex:

@inproceedings {
author = { Nicolas M. Mueller and Philip Sperl, and Philip Sperl and Konstantin Boettinger},
title = { Complex valued neural networks for antispoofing },
year = { 2023 },
booktitle = { Interspeech 2023 },
url = { https://doi.org/10.48550/arXiv.2308.11800 },

}