whynlp
/

tinyllama-lckv-w2-ft-100b

Text Generation

Model card Files Files and versions Community

whynlp commited on Dec 2, 2024

Commit

bd5aba4

·

verified ·

1 Parent(s): d5a8b5e

Update README.md

add info about kv cache saving

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -1,4 +1,5 @@
 ---
 datasets:
 - cerebras/SlimPajama-627B
 language:
@@ -65,6 +66,8 @@ print(response[0]["generated_text"])
 ## The LCKV Collection
 This model was first initialized from the [TinyLlama 2.5T checkpoint](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T), then continued pre-training on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
 Since the model structure has been changed, the initialization cannot inherit the performance of the TinyLlama checkpoint, but it effectively boosts the training process compared to pre-training from scratch.

 ---
+library_name: transformers
 datasets:
 - cerebras/SlimPajama-627B
 language:
 ## The LCKV Collection
+The model has 2 warmup layers. i.e. 3/22 KV cache of a standard TinyLlama.
 This model was first initialized from the [TinyLlama 2.5T checkpoint](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T), then continued pre-training on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
 Since the model structure has been changed, the initialization cannot inherit the performance of the TinyLlama checkpoint, but it effectively boosts the training process compared to pre-training from scratch.