OpenAssistant
/

llama2-13b-orca-8k-3319

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

andreaskoepf commited on Jul 24, 2023

Commit

41d97e2

·

1 Parent(s): bd00b25

move dataset composition info

Files changed (1) hide show

README.md +12 -9

README.md CHANGED Viewed

@@ -68,6 +68,18 @@ This model was trained on:
 - [shahules786/orca-chat](https://huggingface.co/datasets/shahules786/orca-chat)
 - [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T)
 - [atom-in-the-universe/fanfics-10k-50k](https://huggingface.co/datasets/atom-in-the-universe/fanfics-10k-50k)
 The dataset [shahules786/orca-chat](https://huggingface.co/datasets/shahules786/orca-chat) combines similar examples of the GPT-4 subset of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) to form longer conversations
 to improve long-context training.
@@ -105,15 +117,6 @@ llama2_13b_orca_8k:
     type: linear
     scale: 2
   datasets:
-    # Dataset Composition:
-    # Tain (sampled):
-    #   orca-chat: 100.00% (188842)
-    #   fanfics: 100.00% (47760)
-    #   red_pajama: 25.00% (188262)
-    # Valid:
-    #   orca-chat: 5000 (71.43%)
-    #   fanfics: 1000 (14.29%)
-    #   red_pajama: 1000 (14.29%)
     - orca-chat:
         max_val_set: 5000
     - fanfics:

 - [shahules786/orca-chat](https://huggingface.co/datasets/shahules786/orca-chat)
 - [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T)
 - [atom-in-the-universe/fanfics-10k-50k](https://huggingface.co/datasets/atom-in-the-universe/fanfics-10k-50k)
+```
+Dataset Composition:
+    Tain (sampled):
+       orca-chat: 188842 (100%)
+       fanfics: 47760 (100%)
+       red_pajama: 188262 (25%)
+    Valid:
+       orca-chat: 5000
+       fanfics: 1000
+       red_pajama: 1000
+```
 The dataset [shahules786/orca-chat](https://huggingface.co/datasets/shahules786/orca-chat) combines similar examples of the GPT-4 subset of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) to form longer conversations
 to improve long-context training.
     type: linear
     scale: 2
   datasets:
     - orca-chat:
         max_val_set: 5000
     - fanfics: