LeoLM
/

leo-hessianai-7b-chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bjoernp commited on Sep 10, 2023

Commit

4a7bd7e

•

1 Parent(s): 8f6a783

Update README.md

Files changed (1) hide show

README.md +32 -1

README.md CHANGED Viewed

@@ -34,11 +34,42 @@ The model performs exceptionally well on writing, explanation and discussion tas
 - **Finetuned from:** [LeoLM/leo-hessianai-7b](https://huggingface.co/LeoLM/leo-hessianai-7b)
 - **Model type:** Causal decoder-only transformer language model
 - **Language:** English and German
-- **Demo:** [Continuations for 250 random prompts (TGI, 4bit nf4 quantization)](https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-sft%2F2023-08-22_OpenAssistant_llama2-70b-oasst-sft-v10_sampling_noprefix2_nf4.json%0A)
 - **License:** [LLAMA 2 COMMUNITY LICENSE AGREEMENT](https://huggingface.co/meta-llama/Llama-2-70b/raw/main/LICENSE.txt)
 - **Contact:** [LAION Discord](https://discord.com/invite/eq3cAMZtCC) or [Björn Plüster](mailto:[email protected])
 ## Prompting / Prompt Template
 Prompt dialogue template (ChatML format):

 - **Finetuned from:** [LeoLM/leo-hessianai-7b](https://huggingface.co/LeoLM/leo-hessianai-7b)
 - **Model type:** Causal decoder-only transformer language model
 - **Language:** English and German
+- **Demo:** [Web Demo]()
 - **License:** [LLAMA 2 COMMUNITY LICENSE AGREEMENT](https://huggingface.co/meta-llama/Llama-2-70b/raw/main/LICENSE.txt)
 - **Contact:** [LAION Discord](https://discord.com/invite/eq3cAMZtCC) or [Björn Plüster](mailto:[email protected])
+## Use in 🤗Transformers
+If you want faster inference using flash-attention2, you need to install these dependencies:
+```bash
+pip install packaging ninja
+pip install flash-attn==v2.1.1 --no-build-isolation
+pip install git+https://github.com/HazyResearch/[email protected]#subdirectory=csrc/rotary
+```
+Then load the model in transformers:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model = AutoModelForCausalLM.from_pretrained(
+    "LeoLM/leo-hessianai-7b-chat",
+    torch_dtype=torch.float16,
+    trust_remote_code=True              # True for flash-attn, else False
+)
+tokenizer = AutoTokenizer.from_pretrained("LeoLM/leo-hessianai-7b-chat")
+system_prompt = """<|im_start|>system
+Dies ist eine Unterhaltung zwischen einem intelligenten, hilfsbereitem KI-Assistenten und einem Nutzer.
+Der Assistent gibt ausführliche, hilfreiche und ehrliche Antworten.<|im_end|>
+"""
+prompt_format = "<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
+prompt = "Erkläre mir wie die Fahrradwegesituation in Hamburg ist."
+response, history = model.chat(tokenizer, prompt_format.format(prompt=prompt), history=None)
+```
 ## Prompting / Prompt Template
 Prompt dialogue template (ChatML format):