mikudev commited on
Commit
f37915d
·
verified ·
1 Parent(s): 5f9b4a4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - fp8
4
+ license: apache-2.0
5
+ language:
6
+ - en
7
+ base_model: Sao10K/MN-12B-Lyra-v1
8
+ ---
9
+
10
+ Original Model: https://huggingface.co/Sao10K/MN-12B-Lyra-v1
11
+
12
+ Quantized with FP8 using https://github.com/neuralmagic/AutoFP8
13
+
14
+ Script:
15
+ ```python
16
+ from datasets import load_dataset
17
+ from transformers import AutoTokenizer
18
+
19
+ from auto_fp8 import AutoFP8ForCausalLM, BaseQuantizeConfig
20
+
21
+ pretrained_model_dir = "Sao10K/MN-12B-Lyra-v1"
22
+ quantized_model_dir = "MN-12B-Lyra-v1-FP8"
23
+
24
+ tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True, model_max_length=4096)
25
+ tokenizer.pad_token = tokenizer.eos_token
26
+
27
+ ds = load_dataset("mgoin/ultrachat_2k", split="train_sft").select(range(512))
28
+ examples = [tokenizer.apply_chat_template(batch["messages"], tokenize=False) for batch in ds]
29
+ examples = tokenizer(examples, padding=True, truncation=True, return_tensors="pt").to("cuda")
30
+
31
+ quantize_config = BaseQuantizeConfig(
32
+ quant_method="fp8",
33
+ activation_scheme="static",
34
+ ignore_patterns=["re:.*lm_head"],
35
+ )
36
+
37
+ model = AutoFP8ForCausalLM.from_pretrained(
38
+ pretrained_model_dir, quantize_config=quantize_config
39
+ )
40
+
41
+ model.quantize(examples)
42
+ model.save_quantized(quantized_model_dir)
43
+ ```