H-D-T
/

Buzz-8b-Large-v0.5

Text Generation

Alignment-Lab-AI

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Alignment-Lab-AI commited on May 9

Commit

cfd2343

•

1 Parent(s): 1bb132d

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -55,7 +55,7 @@ By combining high quality data, iterative fine-tuning with carefully selected "g
 ## Chat Template and Inference
-To use the Buzz-8b-Medium model for chat-based tasks, you can utilize the provided chat template. Here's an example of how to format the chat template and perform inference using the Hugging Face Transformers library:
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
@@ -89,6 +89,10 @@ response = tokenizer.decode(output[0], skip_special_tokens=True)
 print("Input:", prompt)
 print("Response:", response)
 ``````
 ## Conclusion
 We intend to focus on *updating* and improving the performance of these models, and surrounding open sourced infrastructure. Our next effort will focus on context and implementing the research currently being conducted by [Wing-Lian](https://github.com/winglian), the lead developer of the [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) training framework that underpins these experiments. We encourage the community to explore Wing-Lian's work, such as the [Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE) and [llama-3-8b-256k-PoSE](https://huggingface.co/winglian/llama-3-8b-256k-PoSE) models, which showcase the potential for further advancements in language modeling.

 ## Chat Template and Inference
+To use the Buzz-8b-Medium model for chat-based tasks, you can utilize the provided chat template. Here's an example of how to perform inference using the Hugging Face Transformers library:
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 print("Input:", prompt)
 print("Response:", response)
 ``````
+NOTE: this model is a COMPLETIONS model, it will generate  text by default, which completes the text you send it, it only has a *start* <|begin_of_text|> and a *stop* token <|end_of_text|>
+if you want it to have conversations reliably, append <|end_of_text|>\n<|begin_of_text|>assistant: to the end of your prompt, [the speaker 'assistant' is flexible, and can be tooled to the type of response you want, for example "Mathematician:"" will give you a different type of response, than "felon:"]
+later iterations of the model will likely have formatting similar to *openchat*
 ## Conclusion
 We intend to focus on *updating* and improving the performance of these models, and surrounding open sourced infrastructure. Our next effort will focus on context and implementing the research currently being conducted by [Wing-Lian](https://github.com/winglian), the lead developer of the [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) training framework that underpins these experiments. We encourage the community to explore Wing-Lian's work, such as the [Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE) and [llama-3-8b-256k-PoSE](https://huggingface.co/winglian/llama-3-8b-256k-PoSE) models, which showcase the potential for further advancements in language modeling.