AlignmentResearch/robust_llm_oskar-076a_clf_jailbreak_completions_Llama3.1-8B-Instruct_s-0 Updated 24 days ago • 460
AlignmentResearch/robust_llm_oskar-075a_clf_jailbreak_inputs_Llama3.1-8B-Instruct_s-0 Updated 24 days ago • 344
AlignmentResearch/robust_llm_oskar-076a_clf_jailbreak_completions_Llama3.1-8B-Instruct_s-0 Updated 24 days ago • 460
AlignmentResearch/robust_llm_oskar-075a_clf_jailbreak_inputs_Llama3.1-8B-Instruct_s-0 Updated 24 days ago • 344
Invariance in Policy Optimisation and Partial Identifiability in Reward Learning Paper • 2203.07475 • Published Mar 14, 2022