stereoplegic
's Collections
Adversarial
updated
LTD: Low Temperature Distillation for Robust Adversarial Training
Paper
•
2111.02331
•
Published
•
1
Interpolated Adversarial Training: Achieving Robust Neural Networks
without Sacrificing Too Much Accuracy
Paper
•
1906.06784
•
Published
•
1
Pruning Adversarially Robust Neural Networks without Adversarial
Examples
Paper
•
2210.04311
•
Published
•
1
Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher
Adversarial Distillation
Paper
•
2306.16170
•
Published
•
1
Mutual Adversarial Training: Learning together is better than going
alone
Paper
•
2112.05005
•
Published
•
1
Towards Adversarially Robust Continual Learning
Paper
•
2303.17764
•
Published
•
1
Privacy-Preserving Prompt Tuning for Large Language Model Services
Paper
•
2305.06212
•
Published
•
1
Fine-tuning Aligned Language Models Compromises Safety, Even When Users
Do Not Intend To!
Paper
•
2310.03693
•
Published
•
1
Red-Teaming Large Language Models using Chain of Utterances for
Safety-Alignment
Paper
•
2308.09662
•
Published
•
3
PromptBench: Towards Evaluating the Robustness of Large Language Models
on Adversarial Prompts
Paper
•
2306.04528
•
Published
•
3
On the Adversarial Robustness of Mixture of Experts
Paper
•
2210.10253
•
Published
•
1
CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming
Language Models
Paper
•
2206.00052
•
Published
•
1
Fake Alignment: Are LLMs Really Aligned Well?
Paper
•
2311.05915
•
Published
•
2
Frontier Language Models are not Robust to Adversarial Arithmetic, or
"What do I need to say so you agree 2+2=5?
Paper
•
2311.07587
•
Published
•
4