byroneverson's picture
Update README.md
b936508 verified
|
raw
history blame
1.13 kB
metadata
base_model: google/gemma-2-27b-it
pipeline_tag: text-generation
license: apache-2.0
language:
  - en
tags:
  - gemma
  - gemma-2
  - chat
  - it
  - abliterated
library_name: transformers

NOTE: This is a current WIP (work in progress).

Abliteration method:

  1. Obtain refusal direction with llama-cpp-python.
  2. Orthogonalization performed with torch directly to .safetensors. (one at a time)

It is a rather larger model so it may take me another day or two to figure out which layer I should be using for the direction vector.

First attempt: Layer 20 was used to obtain refusal direction vector. Refusal mitigation sort of worked but not perfect.

Second attempt: (Current) Layer 23 was used (mid-point of model). Half-way has proven to work with other model so this should be fine.

gemma-2-27b-it-abliterated

Check out the jupyter notebook for details of how this model was abliterated from glm-4-9b-chat.

Logo