Llama 3 8B Instruct no refusal
This is a model that uses the orthogonal feature ablation as featured in this paper.
Calibration data:
- 256 prompts from jondurbin/airoboros-2.2
- 256 prompts from AdvBench
- The direction is extracted between layer 16 and 17
The model is still refusing some instructions related to violence, I suspect that a full fine-tune might be needed to remove the rest of the refusals. Use this model responsibly, I decline any liability resulting of the use of this model.
I will post the code later.
- Downloads last month
- 174
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support