Towards Deep Learning Models Resistant to Adversarial Attacks

Author: Aleksander Mądry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu

ICLR 2018

No ISBN/ISSN explicitly provided in the document.

Towards Deep Learning Models Resistant to Adversarial Attacks.pdf

📝 Overview

This paper addresses the issue that deep learning models are vulnerable to adversarial attacks,
and proposes Adversarial Training as a method to enhance model robustness.
Specifically, the authors employ Projected Gradient Descent (PGD) attacks to train robust networks
and conduct experiments on MNIST and CIFAR-10 datasets to verify its effectiveness.

In this review, I summarize the key findings of the paper and extend its scope by conducting additional experiments on FashionMNIST and SVHN
to examine whether the proposed method is effective across different datasets.

1️⃣ Background & Problem Statement

Deep learning models are highly susceptible to adversarial attacks

Adversarial examples: Inputs that fool the model by making subtle, imperceptible modifications

Example:

The original image is correctly classified as a panda
After adding a small amount of adversarial noise, the model misclassifies it as a gibbon

Research Questions:

How can deep learning models be made more robust against adversarial attacks?
What are the limitations of conventional defense methods?

2️⃣ Key Contributions of the Paper

✅ Proposes Adversarial Training as a robust optimization approach

✅ Utilizes PGD (Projected Gradient Descent) attacks for model training
✅ Conducts experiments on MNIST and CIFAR-10 to validate the approach

3️⃣ PGD Attack Concept

FGSM (Fast Gradient Sign Method): Generates adversarial examples with a single-step perturbation
PGD (Projected Gradient Descent):
- Iteratively adds small perturbations to craft stronger adversarial examples

Updates gradients repeatedly to find the most vulnerable attack direction

4️⃣ Adversarial Training Process

Standard Model:
- Easily misclassifies inputs under adversarial attacks
Adversarially Trained Model:
- Learns to be more robust by training with both clean and adversarial examples
Result:
- Shows significantly higher robustness against adversarial attacks

5️⃣ Experimental Results in the Paper

MNIST:

Adversarial Training effectively maintains robustness (above 98%)

CIFAR-10:

Less effective compared to MNIST, with accuracy degrading as adversarial perturbation increases

6️⃣ My Additional Experiments

Since the paper only tested MNIST and CIFAR-10,
→ I extended the experiments to FashionMNIST and SVHN
FashionMNIST Results:

Maintained 80.34% Adversarial Accuracy
Showed a relatively stable robustness

SVHN Results:

Maintained 52.52% Adversarial Accuracy (relatively lower)
Due to complex backgrounds and digit variations, Adversarial Training was less effective

7️⃣ Comparison: Paper Results vs. My Experiment

📌 Conclusion:

✅ Adversarial Training is effective for certain datasets, but

✅ Performance varies across different datasets

8️⃣ Limitations & Future Research

Paper Limitations

1️⃣ Focuses only on PGD attacks (Limited testing on stronger attacks like CW, AutoAttack)
2️⃣ Less effective on CIFAR-10 compared to MNIST

My Experiment Limitations

1️⃣ Significant drop in general accuracy in CIFAR-10
2️⃣ FashionMNIST remains vulnerable to CW and AutoAttack
3️⃣ SVHN shows weak robustness despite Adversarial Training

Future Work

✅ Develop models that are robust against attacks beyond PGD (e.g., CW, AutoAttack)
✅ Explore ways to mitigate accuracy drop in CIFAR-10
✅ Conduct additional experiments on diverse datasets

9️⃣ Conclusion

📌 Key Takeaways from this Study:
1️⃣ Adversarial Training is highly effective in MNIST but has limitations in CIFAR-10
2️⃣ When applied to FashionMNIST and SVHN, performance varies across datasets
3️⃣ CW and AutoAttack reveal vulnerabilities that PGD-based training cannot fully mitigate

📌 Final Thoughts:
✅ Adversarial Training is a powerful defense mechanism, but not a universal solution
✅ More research is needed to improve robustness across various attack types and datasets

2월 6일 세미나 발표본 - 학부연구생 황선준.pdf

https://github.com/justinbrianhwang/RaiseLAB_Study/blob/main/Report%20Review/06-02-2025/Pythoncode/06-02%20semina%20code.py

Page updated

Google Sites

Report abuse