HF1BitLLM
/

Llama3-8B-1.58-100B-tokens

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

added missing imports

#12

by bitsTobyte - opened Nov 19, 2024

base: refs/heads/main

←

from: refs/pr/12

Discussion Files changed

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -33,6 +33,8 @@ pip install git+https://github.com/huggingface/transformers.git@refs/pull/33410/
 ```
 And then load the model :
 ```python
 model = AutoModelForCausalLM.from_pretrained("HF1BitLLM/Llama3-8B-1.58-100B-tokens", device_map="cuda", torch_dtype=torch.bfloat16)
 tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
@@ -40,7 +42,7 @@ tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
 input_text = "Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary?\nAnswer:"
 input_ids = tokenizer.encode(input_text, return_tensors="pt").cuda()
-output = model.generate(input_ids, max_length=10, do_sample=False)
 generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
 print(generated_text)
 ```

 ```
 And then load the model :
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
 model = AutoModelForCausalLM.from_pretrained("HF1BitLLM/Llama3-8B-1.58-100B-tokens", device_map="cuda", torch_dtype=torch.bfloat16)
 tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
 input_text = "Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary?\nAnswer:"
 input_ids = tokenizer.encode(input_text, return_tensors="pt").cuda()
+output = model.generate(input_ids, max_length=64, do_sample=False)
 generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
 print(generated_text)
 ```