lrl-modelcloud commited on
Commit
274f2d9
·
verified ·
1 Parent(s): 95829d8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.2
3
+ language:
4
+ - en
5
+ - de
6
+ - fr
7
+ - it
8
+ - pt
9
+ - hi
10
+ - es
11
+ - th
12
+ base_model:
13
+ - meta-llama/Llama-3.2-1B-Instruct
14
+ pipeline_tag: text-generation
15
+ tags:
16
+ - gptqmodel
17
+ - modelcloud
18
+ - llama3.2
19
+ - instruct
20
+ - int4
21
+ ---
22
+
23
+
24
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641c13e7999935676ec7bc03/Zs93RWyb5wIpodPEwgR2g.png)
25
+
26
+ This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel).
27
+
28
+ - **bits**: 4
29
+ - **dynamic**: null
30
+ - **group_size**: 32
31
+ - **desc_act**: true
32
+ - **static_groups**: false
33
+ - **sym**: true
34
+ - **lm_head**: false
35
+ - **true_sequential**: true
36
+ - **quant_method**: "gptq"
37
+ - **checkpoint_format**: "gptq"
38
+ - **meta**:
39
+ - **quantizer**: gptqmodel:1.1.0
40
+ - **uri**: https://github.com/modelcloud/gptqmodel
41
+ - **damp_percent**: 0.05
42
+ - **damp_auto_increment**: 0.0015
43
+
44
+
45
+ ## Example:
46
+ ```python
47
+ from transformers import AutoTokenizer
48
+ from gptqmodel import GPTQModel
49
+
50
+ model_name = "ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortext-v2"
51
+
52
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
53
+ model = GPTQModel.from_quantized(model_name)
54
+
55
+ messages = [
56
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
57
+ {"role": "user", "content": "Who are you?"},
58
+ ]
59
+ input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
60
+
61
+ outputs = model.generate(input_ids=input_tensor.to(model.device), max_new_tokens=512)
62
+ result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
63
+
64
+ print(result)
65
+ ```