rluukkon commited on
Commit
21eea95
1 Parent(s): 6709000

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -27,7 +27,7 @@ mC4 multilingual colossal, cleaned Common Crawl https://huggingface.co/datasets/
27
 
28
 
29
 
30
- **Sampling ratios**
31
 
32
  |Dataset | Chars | Ratio | Weight | W.Ratio |
33
  |----------|--------|---------|--------|---------|
@@ -43,3 +43,5 @@ mC4 multilingual colossal, cleaned Common Crawl https://huggingface.co/datasets/
43
  |Suomi24 | 20.6B | 9.9\% | 1.0 | 8.9\%|
44
  |Reddit-Fi | 0.7B | 0.4\% | 1.0 | 0.3\%|
45
  |**TOTAL** | **207.0B** | **100.0\%** | **N/A** | **100.0\%** |
 
 
 
27
 
28
 
29
 
30
+ **Sampling ratios for Finnish**
31
 
32
  |Dataset | Chars | Ratio | Weight | W.Ratio |
33
  |----------|--------|---------|--------|---------|
 
43
  |Suomi24 | 20.6B | 9.9\% | 1.0 | 8.9\%|
44
  |Reddit-Fi | 0.7B | 0.4\% | 1.0 | 0.3\%|
45
  |**TOTAL** | **207.0B** | **100.0\%** | **N/A** | **100.0\%** |
46
+
47
+ And for whole continued pretraining, ROOTS is mixed in.