Defences against adversarial attacks Essay

the defender has to go first when we look at the performance of a defense for adversarial examples it's important to think clearly about what our goals are for the new model a lot of the time we see models that increase the error rate on the clean test set at the same time that they decrease the error rate on the adversarial test set so how should we think about navigating this trade-off well first off the trade-off is not necessarily fundamental there are actually cases where adversary trained models perform better on the clean test set then the then the original undefended model did but in most of the recent literature we've seen that if you try to give strong robustness to adversarial examples you usually lose a little bit of accuracy on the test set the way that I think we should think about this is to consider the composition of the actual test set that the model will encounter when we deploy it usually in the machine learning literature when we talk about the test set we're referring to clean iid data that comes from the same distribution as the training set that's probably not what the model is actually going to encounter when it's deployed so the way that you should actually evaluate your model depends on what you think it will see at deployment time a lot of the time in the adversarial example literature we benchmark on the error rate on adversarial examples that's the metric that you would care about if you expect your model to encounter an adversary on every single input that it encounters at deployment time that's probably not realistic instead there will be an adversary present some portion of the time so one thing you might want to do is make a curve like I show here where on the x-axis you gradually increase the proportion of inputs that are adverse early examples rather than clean iid examples on the y axis you plot the accuracy of different models that you're considering so here I plot the accuracy you have three different models top five accuracy on the image net data set the green curve is an undefended baseline model just an inception v3 network and then I consider two different defenses against adversarial examples one of them is the adversarial logic pairing model that we introduced recently it's currently state of the art and imagenet the other one is the mixed PGD defense that it's just another one of the bay to be included in our paper when we look at this trade-off curve we see that on the very left the baseline is better because on the clean data it has the best accuracy on the right adversarial logic pairing is the best because it has the highest accuracy on adversarial examples now let's consider the alternative defense MGP D we see that mpg D is slightly better on the clean data and worse on the adversarial data from that we might think that mpg D navigates a trade-off between clean performance and adversarial performance but actually by making this plot we see that mpg D is not on the top of the trade-off curve at any point going across the the whole the whole sweep of proportion of examples that are adversarial so from this we can see that actually mpg D is not visiting a useful point in the trade-off space we can also see what kind of test set we would need to expect to have before we prefer to use the defense rather than to use the undefended baseline specifically we can see where the green curve intersects with the orange curve that's where about seven point one percent of the examples are adversarial examples if fewer than this number of examples our adversarial examples then we actually prefer to use the undefended baseline because there just isn't enough of an adversarial situation to justify this defense that reduces the test error the test accuracy to make the defense more widely usable we can do two different things we can either improve the accuracy on adversarial examples so that we gain more by trading off clean accuracy or we can increase its accuracy on the clean data so that there isn't as much of a trade-off to be paid in the first place.

How to cite this essay: