DiscoResearch
/

Llama3_DiscoLM_German_8b_v0.1_experimental

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jphme commited on Apr 19, 2024

Commit

07a6432

·

verified ·

1 Parent(s): 551f99a

Update README.md

Files changed (1) hide show

README.md +43 -1

README.md CHANGED Viewed

@@ -3,13 +3,15 @@ library_name: transformers
 tags: []
 ---
 # Llama 3 DiscoLM German 8b v0.1 Experimental
 <p align="center"><img src="disco_llama.webp" width="400"></p>
 # Introduction
-**Llama 3 DiscoLM German 8b v0.1 Experimental** is an experimental Llama 3 based version of DiscoLM German.
 This is an experimental release and not intended for production use. The model is still in development and will be updated with new features and improvements in the future.
@@ -43,6 +45,46 @@ model.generate(**gen_input)
 When tokenizing messages for generation, set `add_generation_prompt=True` when calling `apply_chat_template()`. This will append `<|im_start|>assistant\n` to your prompt, to ensure
 that the model continues with an assistant response.
 # Limitations & Biases

 tags: []
 ---
+*There currently is an issue with the **model generating random reserved special tokens (like "<|reserved_special_token_49|>") at the end**. Please use with `skip_special_tokens=true`. We will update once we found the reason for this behaviour. If you found a solution, plesae let us know!*.
 # Llama 3 DiscoLM German 8b v0.1 Experimental
 <p align="center"><img src="disco_llama.webp" width="400"></p>
 # Introduction
+**Llama 3 DiscoLM German 8b v0.1 Experimental** is an experimental Llama 3 based version of [DiscoLM German](https://huggingface.co/DiscoResearch/DiscoLM_German_7b_v1).
 This is an experimental release and not intended for production use. The model is still in development and will be updated with new features and improvements in the future.
 When tokenizing messages for generation, set `add_generation_prompt=True` when calling `apply_chat_template()`. This will append `<|im_start|>assistant\n` to your prompt, to ensure
 that the model continues with an assistant response.
+# Example Code for Inference
+```python
+model_id = "DiscoResearch/Llama3_DiscoLM_German_8b_v0.1_experimental"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "Du bist ein hilfreicher Assistent."},
+    {"role": "user", "content": "Wer bist du?"},
+]
+input_ids = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to(model.device)
+terminators = [
+    tokenizer.eos_token_id,
+    tokenizer.convert_tokens_to_ids("<|eot_id|>")
+]
+outputs = model.generate(
+    input_ids,
+    max_new_tokens=256,
+    eos_token_id=terminators,
+    do_sample=True,
+    temperature=0.6,
+    top_p=0.9,
+)
+response = outputs[0][input_ids.shape[-1]:]
+print(tokenizer.decode(response, skip_special_tokens=True))
+```
 # Limitations & Biases