abhinavkulkarni
commited on
Commit
•
01ab93b
1
Parent(s):
b2720ce
Update README.md
Browse files
README.md
CHANGED
@@ -29,6 +29,8 @@ Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/
|
|
29 |
|
30 |
This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
|
31 |
|
|
|
|
|
32 |
## How to Use
|
33 |
|
34 |
```bash
|
@@ -65,7 +67,7 @@ q_config = {
|
|
65 |
load_quant = hf_hub_download('abhinavkulkarni/psmathur-orca_mini_v2_7b-w4-g128-awq', 'pytorch_model.bin')
|
66 |
|
67 |
with init_empty_weights():
|
68 |
-
model = AutoModelForCausalLM.
|
69 |
torch_dtype=torch.float16, trust_remote_code=True)
|
70 |
|
71 |
real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)
|
|
|
29 |
|
30 |
This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
|
31 |
|
32 |
+
For Docker users, the `nvcr.io/nvidia/pytorch:23.06-py3` image is runtime v12.1 but otherwise the same as the configuration above and has also been verified to work.
|
33 |
+
|
34 |
## How to Use
|
35 |
|
36 |
```bash
|
|
|
67 |
load_quant = hf_hub_download('abhinavkulkarni/psmathur-orca_mini_v2_7b-w4-g128-awq', 'pytorch_model.bin')
|
68 |
|
69 |
with init_empty_weights():
|
70 |
+
model = AutoModelForCausalLM.from_config(config=config,
|
71 |
torch_dtype=torch.float16, trust_remote_code=True)
|
72 |
|
73 |
real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)
|