Cyrile's picture
Update README.md
bdb4944 verified
metadata
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation

Converted version of Qwen2.5-Coder-7B-Instruct to 4-bit using bitsandbytes. For more information about the model, refer to the model's page.

Impact on performance

Impact of quantization on a set of models.

We evaluated the models using the PoLL (Pool of LLM) technique a panel of giga-models (GPT-4o, Gemini Pro 1.5, and Claude-Sonnet 3.5). The scoring ranged from 0, indicating a model unsuitable for the task, to 5, representing a model that fully met expectations. The evaluation was based on 67 instructions across four programming languages: Python, Java, JavaScript, and Pseudo-code. All tests were conducted in a French-language context, and models were heavily penalized if they responded in another language, even if the response was technically correct.

Performance Scores (on a scale of 5):

Model Score # params (Billion) size (GB)
gemini-1.5-pro 4.51 NA NA
gpt-4o 4.51 NA NA
claude3.5-sonnet 4.49 NA NA
Qwen/Qwen2.5-Coder-32B-Instruct 4.41 32.8 65.6
Qwen/Qwen2.5-32B-Instruct 4.40 32.8 65.6
cmarkea/Qwen2.5-32B-Instruct-4bit 4.36 32.8 16.4
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct 4.24 15.7 31.4
meta-llama/Meta-Llama-3.1-70B-Instruct 4.23 70.06 141.2
cmarkea/Meta-Llama-3.1-70B-Instruct-4bit 4.14 70.06 35.3
Qwen/Qwen2.5-Coder-7B-Instruct 4.11 7.62 15.24
cmarkea/Qwen2.5-Coder-7B-Instruct-4bit 4.08 7.62 3.81
cmarkea/Mixtral-8x7B-Instruct-v0.1-4bit 3.8 46.7 23.35
meta-llama/Meta-Llama-3.1-8B-Instruct 3.73 8.03 16.06
mistralai/Mixtral-8x7B-Instruct-v0.1 3.33 46.7 93.4
codellama/CodeLlama-13b-Instruct-hf 3.33 13 26
codellama/CodeLlama-34b-Instruct-hf 3.27 33.7 67.4
codellama/CodeLlama-7b-Instruct-hf 3.19 6.74 13.48
cmarkea/CodeLlama-34b-Instruct-hf-4bit 3.12 33.7 16.35
codellama/CodeLlama-70b-Instruct-hf 1.82 69 138
cmarkea/CodeLlama-70b-Instruct-hf-4bit 1.64 69 34.5

The impact of quantization is negligible.

Prompt Pattern

Here is a reminder of the command pattern to interact with the model:

<|im_start|>user\n{user_prompt_1}<|im_end|><|im_start|>assistant\n{model_answer_1}<|im_end|>...