kenhktsui commited on
Commit
99be6ac
·
verified ·
1 Parent(s): 3b91173

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -47,15 +47,15 @@ predict(["Hi"])
47
 
48
  |Dataset | Sampling | Average Quality Score |
49
  |--------------------------------------|---|-------------------|
50
- |[nampdn-ai/tiny-orca-textbooks](https://huggingface.co/datasets/nampdn-ai/tiny-orca-textbooks) |First 10,000| 0.8356|
51
- |[nampdn-ai/tiny-textbooks](https://huggingface.co/datasets/nampdn-ai/tiny-textbooks) |First 10,000| 0.7488|
52
- |[SciPhi/textbooks-are-all-you-need-lite](https://huggingface.co/datasets/SciPhi/textbooks-are-all-you-need-lite) |First 10,000| 0.7182|
53
- |[vikp/textbook_quality_programming](https://huggingface.co/datasets/vikp/textbook_quality_programming) |First 10,000| 0.5410|
54
  |[BEE-spoke-data/fineweb-100k_en-med](https://huggingface.co/datasets/BEE-spoke-data/fineweb-100k_en-med)| Full | 0.4754|
55
  |[pszemraj/simple_wikipedia_LM](https://huggingface.co/datasets/pszemraj/simple_wikipedia_LM) | Full | 0.4704|
56
  |[mattymchen/refinedweb-3m](https://huggingface.co/datasets/mattymchen/refinedweb-3m)| Full | 0.2963|
57
  |[JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile)| Full | 0.2562|
58
 
59
 
60
- Average Quality Score is defined as the average probility output of HIGH_QUALITY.
61
  The classifier aligns with the expectation. Textbook category scores the highest, reflecting the effectiveness of this model. Wikipedia scores lower because it is not textbook after all. Web scores the lowest.
 
47
 
48
  |Dataset | Sampling | Average Quality Score |
49
  |--------------------------------------|---|-------------------|
50
+ |[nampdn-ai/tiny-orca-textbooks](https://huggingface.co/datasets/nampdn-ai/tiny-orca-textbooks) |Full | 0.8350|
51
+ |[nampdn-ai/tiny-textbooks](https://huggingface.co/datasets/nampdn-ai/tiny-textbooks) |Full | 0.7535|
52
+ |[SciPhi/textbooks-are-all-you-need-lite](https://huggingface.co/datasets/SciPhi/textbooks-are-all-you-need-lite) |Full | 0.7202|
53
+ |[vikp/textbook_quality_programming](https://huggingface.co/datasets/vikp/textbook_quality_programming) |Full| 0.5447|
54
  |[BEE-spoke-data/fineweb-100k_en-med](https://huggingface.co/datasets/BEE-spoke-data/fineweb-100k_en-med)| Full | 0.4754|
55
  |[pszemraj/simple_wikipedia_LM](https://huggingface.co/datasets/pszemraj/simple_wikipedia_LM) | Full | 0.4704|
56
  |[mattymchen/refinedweb-3m](https://huggingface.co/datasets/mattymchen/refinedweb-3m)| Full | 0.2963|
57
  |[JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile)| Full | 0.2562|
58
 
59
 
60
+ Average Quality Score is defined as the average probability output of HIGH_QUALITY.
61
  The classifier aligns with the expectation. Textbook category scores the highest, reflecting the effectiveness of this model. Wikipedia scores lower because it is not textbook after all. Web scores the lowest.