TUM Logo

Is Feature Selection Secure against Training Data Poisoning?

Learning in adversarial settings is becoming animportant task for application domains where at-tackers may inject malicious data into the train-ing set to subvert normal operation of data-driventechnologies. Feature selection has been widelyused in machine learning for security applica-tions to improve generalization and computa-tional efficiency, although it is not clear whetherits use may be beneficial or even counterproduc-tive when training data are poisoned by intelli-gent attackers. In this work, we shed light onthis issue by providing a framework to investi-gate the robustness of popular feature selectionmethods, including LASSO, ridge regression andthe elastic net. Our results on malware detec-tion show that feature selection methods can besignificantly compromised under attack (we canreduce LASSO to almost random choices of fea-ture sets by careful insertion of less than 5% poi-soned training samples), highlighting the needfor specific countermeasures.

Is Feature Selection Secure against Training Data Poisoning?

Authors: Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, and Fabio Roli
Year/month: 2015/7
Booktitle: Proceedings of The 32nd International Conference on Machine Learning (ICML'15)
Pages: 1689–1698
Address: Lille, France
Fulltext: main-camera-ready.pdf

Abstract

Learning in adversarial settings is becoming animportant task for application domains where at-tackers may inject malicious data into the train-ing set to subvert normal operation of data-driventechnologies. Feature selection has been widelyused in machine learning for security applica-tions to improve generalization and computa-tional efficiency, although it is not clear whetherits use may be beneficial or even counterproduc-tive when training data are poisoned by intelli-gent attackers. In this work, we shed light onthis issue by providing a framework to investi-gate the robustness of popular feature selectionmethods, including LASSO, ridge regression andthe elastic net. Our results on malware detec-tion show that feature selection methods can besignificantly compromised under attack (we canreduce LASSO to almost random choices of fea-ture sets by careful insertion of less than 5% poi-soned training samples), highlighting the needfor specific countermeasures.

Bibtex:

@conference { huang_icml15,
author = { Huang Xiao and Battista Biggio and Gavin Brown and Giorgio Fumera and Claudia Eckert and Fabio Roli },
title = { Is Feature Selection Secure against Training Data Poisoning? },
year = { 2015 },
month = { July },
booktitle = { Proceedings of The 32nd International Conference on Machine Learning (ICML'15) },
address = { Lille, France },
pages = { 1689–1698 },
url = {https://www.sec.in.tum.de/i20/publications/is-feature-selection-secure-against-training-data-poisoning/@@download/file/main-camera-ready.pdf}
}