aisingapore
/

sea-lion-3b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

dotw commited on Oct 24, 2023

Commit

b72b898

·

1 Parent(s): 863b935

Update README.md

Files changed (1) hide show

README.md +22 -7

README.md CHANGED Viewed

@@ -78,7 +78,7 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
@@ -117,16 +117,24 @@ SEA LION 3B was trained on 980B tokens of RefinedWeb (English) and mC4 (Chinese,
 [More Information Needed]
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
@@ -182,19 +190,26 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 ### Model Architecture and Objective
-[More Information Needed]
 ### Compute Infrastructure
-[More Information Needed]
 #### Hardware
-[More Information Needed]
 #### Software
-[More Information Needed]
 ## Citation [optional]

 Use the code below to get started with the model.
+[Todo: Insert Code Here]
 ## Training Details
 [More Information Needed]
+SEA LION 3B was trained on 256 A100 40GB GPUs, using MosaicML Composer.
 #### Training Hyperparameters
+| Hyperparameter    | Value             |
+|-------------------|-------------------|
+| Precision         | bfloat16          |
+| Optimizer         | decoupled_adamw   |
+| Scheduler         | cosin_with_warmup |
+| Learning Rate     | 1.6e-4            |
+| Global Batch Size | 1200              |
+| Micro Batch Size  | 5                 |
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+The training took 14 days to complete.
 ## Evaluation
 ### Model Architecture and Objective
+SEA LION 3B is a decoder model using the MPT architecture.
+| Parameter       | Value  |
+|-----------------|--------|
+| Layers          | 40     |
+| d_model         | ?      |
+| head_dim        | ?      |
+| Vocabulary      | 256000 |
+| Sequence Length | 2048   |
 ### Compute Infrastructure
 #### Hardware
+SEA LION 3B was trained on AWS EC2 cluster comprising 32 p4d.24xlarge instances, using a total of 256 A100 40GB GPUs.
 #### Software
+SEA LION 3B was trained using MosaicML Composer using PyTorch FullyShardedDataParallelism (FSDP).
 ## Citation [optional]