crumb commited on
Commit
df6df38
·
1 Parent(s): b550ff7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -28,6 +28,8 @@ prompt.
28
 
29
  ### Usage
30
 
 
 
31
  ```
32
  %pip install -qq transformers accelerate bitsandbytes
33
  ```
@@ -56,7 +58,7 @@ model = AutoModelForCausalLM.from_pretrained(
56
  ```python
57
  inputs = tokenizer("Once upon a time,", return_tensors='pt')
58
  inputs = {
59
- k:v.cpu() for k,v in inputs.items()
60
  }
61
  outputs = model.generate(
62
  **inputs,
@@ -68,8 +70,8 @@ tokenizer.decode(outputs[0])
68
  ```
69
 
70
  TODO
71
- - test to see if model works with .from_pretrained <br>
72
- - test fp32, fp16, 8 and 4 bit
73
  - shard model to max 1gb for use in even lower vram settings <br>
74
  - safetensors <br>
75
  - upload bf16 version of model <br>
 
28
 
29
  ### Usage
30
 
31
+ Inference on GPU with 4-bit quantization:
32
+
33
  ```
34
  %pip install -qq transformers accelerate bitsandbytes
35
  ```
 
58
  ```python
59
  inputs = tokenizer("Once upon a time,", return_tensors='pt')
60
  inputs = {
61
+ k:v.cuda() for k,v in inputs.items()
62
  }
63
  outputs = model.generate(
64
  **inputs,
 
70
  ```
71
 
72
  TODO
73
+ - ~~test to see if model works with .from_pretrained~~ <br>
74
+ - ~~test fp32, fp16, 8 and 4 bit~~
75
  - shard model to max 1gb for use in even lower vram settings <br>
76
  - safetensors <br>
77
  - upload bf16 version of model <br>