Suparious commited on
Commit
425353d
·
verified ·
1 Parent(s): 8d02893

Add processing notice

Browse files
Files changed (1) hide show
  1. README.md +2 -80
README.md CHANGED
@@ -1,91 +1,13 @@
1
  ---
2
- base_model: mlabonne/ChimeraLlama-3-8B-v3
3
  inference: false
4
- library_name: transformers
5
- license: other
6
- merged_models:
7
- - NousResearch/Meta-Llama-3-8B-Instruct
8
- - mlabonne/OrpoLlama-3-8B
9
- - cognitivecomputations/dolphin-2.9-llama3-8b
10
- - Danielbrdz/Barcenas-Llama3-8b-ORPO
11
- - VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct
12
- - vicgalle/Configurable-Llama-3-8B-v0.3
13
- - MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3
14
- pipeline_tag: text-generation
15
- quantized_by: Suparious
16
- tags:
17
- - 4-bit
18
- - AWQ
19
- - text-generation
20
- - autotrain_compatible
21
- - endpoints_compatible
22
- - merge
23
- - mergekit
24
- - lazymergekit
25
  ---
26
  # mlabonne/ChimeraLlama-3-8B-v3 AWQ
27
 
 
 
28
  - Model creator: [mlabonne](https://huggingface.co/mlabonne)
29
  - Original model: [ChimeraLlama-3-8B-v3](https://huggingface.co/mlabonne/ChimeraLlama-3-8B-v3)
30
 
31
- ## Model Summary
32
-
33
- ChimeraLlama-3-8B-v3 is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
34
- * [NousResearch/Meta-Llama-3-8B-Instruct](https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct)
35
- * [mlabonne/OrpoLlama-3-8B](https://huggingface.co/mlabonne/OrpoLlama-3-8B)
36
- * [cognitivecomputations/dolphin-2.9-llama3-8b](https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b)
37
- * [Danielbrdz/Barcenas-Llama3-8b-ORPO](https://huggingface.co/Danielbrdz/Barcenas-Llama3-8b-ORPO)
38
- * [VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct)
39
- * [vicgalle/Configurable-Llama-3-8B-v0.3](https://huggingface.co/vicgalle/Configurable-Llama-3-8B-v0.3)
40
- * [MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3](https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3)
41
-
42
- ## How to use
43
-
44
- ### Install the necessary packages
45
-
46
- ```bash
47
- pip install --upgrade autoawq autoawq-kernels
48
- ```
49
-
50
- ### Example Python code
51
-
52
- ```python
53
- from awq import AutoAWQForCausalLM
54
- from transformers import AutoTokenizer, TextStreamer
55
-
56
- model_path = "solidrust/ChimeraLlama-3-8B-v3-AWQ"
57
- system_message = "You are ChimeraLlama-3-8B-v3, incarnated as a powerful AI. You were created by mlabonne."
58
-
59
- # Load model
60
- model = AutoAWQForCausalLM.from_quantized(model_path,
61
- fuse_layers=True)
62
- tokenizer = AutoTokenizer.from_pretrained(model_path,
63
- trust_remote_code=True)
64
- streamer = TextStreamer(tokenizer,
65
- skip_prompt=True,
66
- skip_special_tokens=True)
67
-
68
- # Convert prompt to tokens
69
- prompt_template = """\
70
- <|im_start|>system
71
- {system_message}<|im_end|>
72
- <|im_start|>user
73
- {prompt}<|im_end|>
74
- <|im_start|>assistant"""
75
-
76
- prompt = "You're standing on the surface of the Earth. "\
77
- "You walk one mile south, one mile west and one mile north. "\
78
- "You end up exactly where you started. Where are you?"
79
-
80
- tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
81
- return_tensors='pt').input_ids.cuda()
82
-
83
- # Generate output
84
- generation_output = model.generate(tokens,
85
- streamer=streamer,
86
- max_new_tokens=512)
87
- ```
88
-
89
  ### About AWQ
90
 
91
  AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
 
1
  ---
 
2
  inference: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  # mlabonne/ChimeraLlama-3-8B-v3 AWQ
5
 
6
+ ** PROCESSING .... ETA 30mins **
7
+
8
  - Model creator: [mlabonne](https://huggingface.co/mlabonne)
9
  - Original model: [ChimeraLlama-3-8B-v3](https://huggingface.co/mlabonne/ChimeraLlama-3-8B-v3)
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ### About AWQ
12
 
13
  AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.