wonhosong commited on
Commit
63f82b2
1 Parent(s): 0dcf44f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -41,6 +41,33 @@ pipeline_tag: text-generation
41
  {Assistant}
42
  ```
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  ## Hardware and Software
45
 
46
  * **Hardware**: We utilized an A100x8 * 4 for training our model
 
41
  {Assistant}
42
  ```
43
 
44
+ ## Usage
45
+
46
+ - Tested on A100 80GB
47
+ - Our model can handle up to 10k input tokens, thanks to the `rope_scaling` option
48
+
49
+ ```python
50
+ import torch
51
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
52
+
53
+ tokenizer = AutoTokenizer.from_pretrained("upstage/llama-65b-instruct")
54
+ model = AutoModelForCausalLM.from_pretrained(
55
+ "upstage/llama-65b-instruct",
56
+ device_map="auto",
57
+ torch_dtype=torch.float16,
58
+ load_in_8bit=True,
59
+ rope_scaling={"type": "dynamic", "factor": 2} # allows handling of longer inputs
60
+ )
61
+
62
+ prompt = "### User:\nThomas is healthy, but he has to go to the hospital. What could be the reasons?\n\n### Assistant:\n"
63
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
64
+ del inputs["token_type_ids"]
65
+ streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
66
+
67
+ output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=float('inf'))
68
+ output_text = tokenizer.decode(output[0], skip_special_tokens=True)
69
+ ```
70
+
71
  ## Hardware and Software
72
 
73
  * **Hardware**: We utilized an A100x8 * 4 for training our model