francislabounty
commited on
Commit
•
2847e85
1
Parent(s):
357aeab
Update README.md
Browse files
README.md
CHANGED
@@ -66,4 +66,11 @@ inputs = tokenizer(prompt, return_tensors="pt")
|
|
66 |
inputs = inputs.to(model.device)
|
67 |
pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
|
68 |
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
69 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
inputs = inputs.to(model.device)
|
67 |
pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
|
68 |
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
69 |
+
```
|
70 |
+
|
71 |
+
## Other Information
|
72 |
+
Paper reference: [Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731)
|
73 |
+
[Original Paper repo](https://github.com/wuhy68/Parameter-Efficient-MoE)
|
74 |
+
[Forked repo with mistral support (sparsetral)](https://github.com/serp-ai/Parameter-Efficient-MoE)
|
75 |
+
|
76 |
+
If you are interested in faster inferencing, check out our [fork of vLLM](https://github.com/serp-ai/vllm) that adds sparsetral support
|