astronomer
/

Llama-3-8B-GPTQ-8-Bit

Text Generation

Inference Endpoints

text-generation-inference

8-bit precision

Model card Files Files and versions Community

davidxmle commited on Apr 21, 2024

Commit

9a115a3

·

verified ·

1 Parent(s): eb2df45

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -42,6 +42,11 @@ datasets:
 - Built with Meta Llama 3
 - Quantized by [Astronomer](https://astronomer.io)
 ## Important Note About Serving with vLLM & oobabooga/text-generation-webui
 - For loading this model onto vLLM, make sure all requests have `"stop_token_ids":[128001, 128009]` to temporarily address the non-stop generation issue.
    - vLLM does not yet respect `generation_config.json`.

 - Built with Meta Llama 3
 - Quantized by [Astronomer](https://astronomer.io)
+## MUST READ: Very Important!! Note About Untrained Special Tokens in Llama 3 Base (Non-instruct) Models & Fine-tuning Llama 3 Base
+- Special tokens such as the ones used for instruct are undertrained in Llama 3 base models. (discovered by Daniel Han https://twitter.com/danielhanchen/status/1781395882925343058)
+- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/655ad0f8727df37c77a09cb9/1U2rRrx60p1pNeeAZw8Rd.png)
+- A patch function is under way, fine-tuning this model for instruction following may cause `NaN` graidents unless this problem is addressed.
 ## Important Note About Serving with vLLM & oobabooga/text-generation-webui
 - For loading this model onto vLLM, make sure all requests have `"stop_token_ids":[128001, 128009]` to temporarily address the non-stop generation issue.
    - vLLM does not yet respect `generation_config.json`.