ahxt
/

LiteLlama-460M-1T

@@ -26,7 +26,7 @@ We train our models on part of [RedPajama](https://www.together.xyz/blog/redpaja
 ## Training Details
-The model was trained with ~1T tokens (0.98T). num of tokens = steps*length*batch_size=499679*1024*192=98240888832≈0.98T.
 The training curve is at this [WandB project](https://wandb.ai/ahxt/llama2_xs_460M_training_loss/reports/reduced_train_loss-23-09-05-20-25-43---Vmlldzo1MzIwNDUx?accessToken=x2ch3n30jo77p1x8y7q9js4h4d8zpjtz1tzot4xxullyefixp4jwt7au2q37k2q6).

 ## Training Details
+The model was trained with ~1T tokens (0.98T). num of tokens = steps \* length \* batch_size = 499679 \* 1024 \* 192 = 98240888832 ≈ 0.98T.
 The training curve is at this [WandB project](https://wandb.ai/ahxt/llama2_xs_460M_training_loss/reports/reduced_train_loss-23-09-05-20-25-43---Vmlldzo1MzIwNDUx?accessToken=x2ch3n30jo77p1x8y7q9js4h4d8zpjtz1tzot4xxullyefixp4jwt7au2q37k2q6).