[BA] Defenses to Adversarial Examples in Problem Space
Defenses to Adversarial Examples in Problem Space
Adversarial examples pose a problem for machine learning-based classifiers, in that they allow an attacker to cause missclassification with only small changes to the input data.
This is possible for Android malware in an automated fashion, in problem space, assuming a white-box attack model.
This thesis shows that while improving performance on unmodified data, regularization techniques like Dropout and CutMix do little to thwart adversarial evasion attacks.
On the other hand, adversarial training on adversarial examples in problem space works well and allows the classifier to detect most adversarial examples.
Similar performance can be achieved by generating adversarial examples in feature space during training.
Classifiers are evaluated on an unseen temporally seperated test set with realistic class ratio and it is determined that performance on normal data suffers only slightly when training adversarially.
Time sensitive separation of training and test data is also shown to be important for evaluating resistance to adversarial examples — detection is significantly easier on validation data contemporary to the training set.