Is Feature Selection Secure against Training Data Poisoning?
Learning in adversarial settings is becoming animportant task for application domains where at-tackers may inject malicious data into the train-ing set to subvert normal operation of data-driventechnologies. Feature selection has been widelyused in machine learning for security applica-tions to improve generalization and computa-tional efficiency, although it is not clear whetherits use may be beneficial or even counterproduc-tive when training data are poisoned by intelli-gent attackers. In this work, we shed light onthis issue by providing a framework to investi-gate the robustness of popular feature selectionmethods, including LASSO, ridge regression andthe elastic net. Our results on malware detec-tion show that feature selection methods can besignificantly compromised under attack (we canreduce LASSO to almost random choices of fea-ture sets by careful insertion of less than 5% poi-soned training samples), highlighting the needfor specific countermeasures.
Is Feature Selection Secure against Training Data Poisoning?
Authors: | Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, and Fabio Roli |
Year/month: | 2015/7 |
Booktitle: | Proceedings of The 32nd International Conference on Machine Learning (ICML'15) |
Pages: | 1689–1698 |
Address: | Lille, France |
Fulltext: | main-camera-ready.pdf |
Abstract |
|
Learning in adversarial settings is becoming animportant task for application domains where at-tackers may inject malicious data into the train-ing set to subvert normal operation of data-driventechnologies. Feature selection has been widelyused in machine learning for security applica-tions to improve generalization and computa-tional efficiency, although it is not clear whetherits use may be beneficial or even counterproduc-tive when training data are poisoned by intelli-gent attackers. In this work, we shed light onthis issue by providing a framework to investi-gate the robustness of popular feature selectionmethods, including LASSO, ridge regression andthe elastic net. Our results on malware detec-tion show that feature selection methods can besignificantly compromised under attack (we canreduce LASSO to almost random choices of fea-ture sets by careful insertion of less than 5% poi-soned training samples), highlighting the needfor specific countermeasures. |
Bibtex:
@conference { huang_icml15,author = { Huang Xiao and Battista Biggio and Gavin Brown and Giorgio Fumera and Claudia Eckert and Fabio Roli },
title = { Is Feature Selection Secure against Training Data Poisoning? },
year = { 2015 },
month = { July },
booktitle = { Proceedings of The 32nd International Conference on Machine Learning (ICML'15) },
address = { Lille, France },
pages = { 1689–1698 },
url = {https://www.sec.in.tum.de/i20/publications/is-feature-selection-secure-against-training-data-poisoning/@@download/file/main-camera-ready.pdf}
}