whynlp commited on
Commit
bd5aba4
·
verified ·
1 Parent(s): d5a8b5e

Update README.md

Browse files

add info about kv cache saving

Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -1,4 +1,5 @@
1
  ---
 
2
  datasets:
3
  - cerebras/SlimPajama-627B
4
  language:
@@ -65,6 +66,8 @@ print(response[0]["generated_text"])
65
 
66
  ## The LCKV Collection
67
 
 
 
68
  This model was first initialized from the [TinyLlama 2.5T checkpoint](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T), then continued pre-training on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
69
 
70
  Since the model structure has been changed, the initialization cannot inherit the performance of the TinyLlama checkpoint, but it effectively boosts the training process compared to pre-training from scratch.
 
1
  ---
2
+ library_name: transformers
3
  datasets:
4
  - cerebras/SlimPajama-627B
5
  language:
 
66
 
67
  ## The LCKV Collection
68
 
69
+ The model has 2 warmup layers. i.e. 3/22 KV cache of a standard TinyLlama.
70
+
71
  This model was first initialized from the [TinyLlama 2.5T checkpoint](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T), then continued pre-training on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
72
 
73
  Since the model structure has been changed, the initialization cannot inherit the performance of the TinyLlama checkpoint, but it effectively boosts the training process compared to pre-training from scratch.