Text Classification
fastText
English
kenhktsui commited on
Commit
e1bfefc
1 Parent(s): 5dccdbc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -88,11 +88,18 @@ predict_educational_value(["Hi"])
88
  # Output: [3.0000010156072676e-05]
89
 
90
  ```
91
- ## Benchmark
92
  To make sure this classifier makes sense, it is applied to various datasets.
93
 
94
  Educational Value = 2 point * P(High) + 1 point * P(Mid) + 0 point * P(Low)
95
 
 
 
 
 
 
 
 
96
 
97
  |Dataset | Sampling | Average Educational Value | Type |
98
  |--------------------------------------|---|-------------------|-------|
@@ -115,7 +122,11 @@ Educational Value = 2 point * P(High) + 1 point * P(Mid) + 0 point * P(Low)
115
  |[HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)| First 10,000 | 1.058|Real|
116
  |[BEE-spoke-data/fineweb-100k_en-med](https://huggingface.co/datasets/BEE-spoke-data/fineweb-100k_en-med)| First 10,000 | 1.017|Real|
117
  |[JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile)| First 10,000 | 0.994|Real|
 
118
  |[mattymchen/refinedweb-3m](https://huggingface.co/datasets/mattymchen/refinedweb-3m)| First 10,000 | 0.853|Real|
 
 
 
119
  \* I encounted an [issue](https://huggingface.co/datasets/allenai/dolma/discussions/26) so that I cannot process the original [allenai/dolma](https://huggingface.co/datasets/allenai/dolma).
120
 
121
 
 
88
  # Output: [3.0000010156072676e-05]
89
 
90
  ```
91
+ # Benchmark
92
  To make sure this classifier makes sense, it is applied to various datasets.
93
 
94
  Educational Value = 2 point * P(High) + 1 point * P(Mid) + 0 point * P(Low)
95
 
96
+ The score can be interpreted as:
97
+ |Educational Value| Category |
98
+ |--------|----------|
99
+ |2 | High|
100
+ |1 | Mid|
101
+ |0 | Low|
102
+
103
 
104
  |Dataset | Sampling | Average Educational Value | Type |
105
  |--------------------------------------|---|-------------------|-------|
 
122
  |[HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)| First 10,000 | 1.058|Real|
123
  |[BEE-spoke-data/fineweb-100k_en-med](https://huggingface.co/datasets/BEE-spoke-data/fineweb-100k_en-med)| First 10,000 | 1.017|Real|
124
  |[JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile)| First 10,000 | 0.994|Real|
125
+ |[togethercomputer/RedPajama-Data-V2](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2)| First 10,000 | 0.979|Real|
126
  |[mattymchen/refinedweb-3m](https://huggingface.co/datasets/mattymchen/refinedweb-3m)| First 10,000 | 0.853|Real|
127
+ |[tiiuae/falcon-refinedweb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)| First 10,000 | 0.798|Real|
128
+
129
+
130
  \* I encounted an [issue](https://huggingface.co/datasets/allenai/dolma/discussions/26) so that I cannot process the original [allenai/dolma](https://huggingface.co/datasets/allenai/dolma).
131
 
132