SmallDoge
/

Doge-20M

Text Generation

Model card Files Files and versions Community

JingzeShi commited on Jan 14

Commit

5f7d5d3

·

verified ·

1 Parent(s): 5ad110a

Update README.md

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

@@ -34,11 +34,13 @@ In addition, Doge uses Dynamic Mask Attention as sequence transformation and can
 ## Model Details
 > NOTE: These models has not been fine-tuned for instruction, the instruction model is [here](https://huggingface.co/JingzeShi/Doge-20M-Instruct).
 > TODO: The larger model is under training and will be uploaded soon.
-**Training**:
 | Model | Training Data | Steps | Content Length | Tokens | LR | Batch Size | Precision |
 |---|---|---|---|---|---|---|---|
@@ -55,6 +57,11 @@ In addition, Doge uses Dynamic Mask Attention as sequence transformation and can
 > All evaluations are done using five-shot settings, without additional training on the benchmarks.
 **Environment**:
 - Image: nvcr.io/nvidia/pytorch:24.12-py3

 ## Model Details
+We build the Doge by doing Per-Training on [Smollm-Corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus).
 > NOTE: These models has not been fine-tuned for instruction, the instruction model is [here](https://huggingface.co/JingzeShi/Doge-20M-Instruct).
 > TODO: The larger model is under training and will be uploaded soon.
+**Pre-Training**:
 | Model | Training Data | Steps | Content Length | Tokens | LR | Batch Size | Precision |
 |---|---|---|---|---|---|---|---|
 > All evaluations are done using five-shot settings, without additional training on the benchmarks.
+**Procedure**:
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/loser_cheems/huggingface/runs/p8x93v5l)
 **Environment**:
 - Image: nvcr.io/nvidia/pytorch:24.12-py3