# Details # Phi4 Abliterated This is **Phi4 abliterated** using a new methodology (why nobody tried that before?) aimed at improving its usability and neutrality. ## Goal The objective is to create a model that is **neutral**: - **Not uncensored**, but avoids refusing neutral prompts it would ordinarily reject. - Enables fine-tuning for reduced censorship, starting from a neutral baseline. ## Original Methodology In the original implementation: 1. Harmful and harmless prompts were compared on **one specific layer** of the model. 2. The computed refusal direction was then applied to **all layers**. ### Problem: The resulting model: - Became **less usable** and somewhat "dumb." - Likely due to applying a single refusal direction uniformly across all layers, disregarding their unique needs. ## New Approach In my fork, available here: 👉 [https://github.com/Undi95/abliteration/](https://github.com/Undi95/abliteration/) (based on the original [https://github.com/Orion-zhen/abliteration.git](https://github.com/Orion-zhen/abliteration.git)) I introduced a new approach: - Each layer computes its **own refusal direction**. - The refusal direction is **layer-specific**, addressing the assumption that each layer has different characteristics and requirements. ## Hypothesis This method avoids over-generalizing the refusal direction and allows each layer to retain its unique properties. The result: - A more **usable** and **intelligent** model. - A neutral starting point for further fine-tuning to reduce censorship without compromising performance. ## Next Steps After applying this method, the model can be fine-tuned to: - Reduce over-censoring behavior. - Maintain neutrality while improving overall utility.