LoneStriker's picture
ExLLaMA V2 quant of CodeBooga-34B-v0.1-4.0bpw-h6-exl2
690ea19
metadata
license: llama2

CodeBooga-34B-v0.1

This is a merge between the following two models:

  1. Phind-CodeLlama-34B-v2
  2. WizardCoder-Python-34B-V1.0

It was created with the BlockMerge Gradient script, the same one that was used to create MythoMax-L2-13b, and with the same settings. The following YAML was used:

model_path1: "Phind_Phind-CodeLlama-34B-v2_safetensors"
model_path2: "WizardLM_WizardCoder-Python-34B-V1.0_safetensors"
output_model_path: "CodeBooga-34B-v0.1"
operations:
  - operation: lm_head # Single tensor
    filter: "lm_head"
    gradient_values: [0.75]
  - operation: embed_tokens # Single tensor
    filter: "embed_tokens"
    gradient_values: [0.75]
  - operation: self_attn
    filter: "self_attn"
    gradient_values: [0.75, 0.25]
  - operation: mlp
    filter: "mlp"
    gradient_values: [0.25, 0.75]
  - operation: layernorm
    filter: "layernorm"
    gradient_values: [0.5, 0.5]
  - operation: modelnorm # Single tensor
    filter: "model.norm"
    gradient_values: [0.75]

Prompt format

Both base models use the Alpaca format, so it should be used for this one as well.

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Your instruction

### Response:
Bot reply

### Instruction:
Another instruction

### Response:
Bot reply

Evaluation

I made a quick experiment where I asked a set of 3 Python and 3 Javascript questions (real world, difficult questions with nuance) to the following models:

  1. This one
  2. A second variant generated with model_path1 and model_path2 swapped in the YAML above, which I called CodeBooga-Reversed-34B-v0.1
  3. WizardCoder-Python-34B-V1.0
  4. Phind-CodeLlama-34B-v2

Specifically, I used 4.250b EXL2 quantizations of each. I then sorted the responses for each question by quality, and attributed the following scores:

  • 4th place: 0
  • 3rd place: 1
  • 2nd place: 2
  • 1st place: 4

The resulting cumulative scores were:

  • CodeBooga-34B-v0.1: 22
  • WizardCoder-Python-34B-V1.0: 12
  • Phind-CodeLlama-34B-v2: 7
  • CodeBooga-Reversed-34B-v0.1: 1

CodeBooga-34B-v0.1 performed very well, while its variant performed poorly, so I uploaded the former but not the latter.

Recommended settings

I recommend the Divine Intellect preset for instruction-following models like this, as per the Preset Arena experiment results:

temperature: 1.31
top_p: 0.14
repetition_penalty: 1.17
top_k: 49

Quantized versions

EXL2

A 4.250b EXL2 version of the model can be found here:

https://huggingface.co/oobabooga/CodeBooga-34B-v0.1-EXL2-4.250b

GGUF

TheBloke has kindly provided GGUF quantizations for llama.cpp:

https://huggingface.co/TheBloke/CodeBooga-34B-v0.1-GGUF