Update README.md
Browse files
README.md
CHANGED
@@ -47,15 +47,15 @@ predict(["Hi"])
|
|
47 |
|
48 |
|Dataset | Sampling | Average Quality Score |
|
49 |
|--------------------------------------|---|-------------------|
|
50 |
-
|[nampdn-ai/tiny-orca-textbooks](https://huggingface.co/datasets/nampdn-ai/tiny-orca-textbooks) |
|
51 |
-
|[nampdn-ai/tiny-textbooks](https://huggingface.co/datasets/nampdn-ai/tiny-textbooks) |
|
52 |
-
|[SciPhi/textbooks-are-all-you-need-lite](https://huggingface.co/datasets/SciPhi/textbooks-are-all-you-need-lite) |
|
53 |
-
|[vikp/textbook_quality_programming](https://huggingface.co/datasets/vikp/textbook_quality_programming) |
|
54 |
|[BEE-spoke-data/fineweb-100k_en-med](https://huggingface.co/datasets/BEE-spoke-data/fineweb-100k_en-med)| Full | 0.4754|
|
55 |
|[pszemraj/simple_wikipedia_LM](https://huggingface.co/datasets/pszemraj/simple_wikipedia_LM) | Full | 0.4704|
|
56 |
|[mattymchen/refinedweb-3m](https://huggingface.co/datasets/mattymchen/refinedweb-3m)| Full | 0.2963|
|
57 |
|[JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile)| Full | 0.2562|
|
58 |
|
59 |
|
60 |
-
Average Quality Score is defined as the average
|
61 |
The classifier aligns with the expectation. Textbook category scores the highest, reflecting the effectiveness of this model. Wikipedia scores lower because it is not textbook after all. Web scores the lowest.
|
|
|
47 |
|
48 |
|Dataset | Sampling | Average Quality Score |
|
49 |
|--------------------------------------|---|-------------------|
|
50 |
+
|[nampdn-ai/tiny-orca-textbooks](https://huggingface.co/datasets/nampdn-ai/tiny-orca-textbooks) |Full | 0.8350|
|
51 |
+
|[nampdn-ai/tiny-textbooks](https://huggingface.co/datasets/nampdn-ai/tiny-textbooks) |Full | 0.7535|
|
52 |
+
|[SciPhi/textbooks-are-all-you-need-lite](https://huggingface.co/datasets/SciPhi/textbooks-are-all-you-need-lite) |Full | 0.7202|
|
53 |
+
|[vikp/textbook_quality_programming](https://huggingface.co/datasets/vikp/textbook_quality_programming) |Full| 0.5447|
|
54 |
|[BEE-spoke-data/fineweb-100k_en-med](https://huggingface.co/datasets/BEE-spoke-data/fineweb-100k_en-med)| Full | 0.4754|
|
55 |
|[pszemraj/simple_wikipedia_LM](https://huggingface.co/datasets/pszemraj/simple_wikipedia_LM) | Full | 0.4704|
|
56 |
|[mattymchen/refinedweb-3m](https://huggingface.co/datasets/mattymchen/refinedweb-3m)| Full | 0.2963|
|
57 |
|[JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile)| Full | 0.2562|
|
58 |
|
59 |
|
60 |
+
Average Quality Score is defined as the average probability output of HIGH_QUALITY.
|
61 |
The classifier aligns with the expectation. Textbook category scores the highest, reflecting the effectiveness of this model. Wikipedia scores lower because it is not textbook after all. Web scores the lowest.
|