tokyo-electron-device-ai commited on
Commit
eece17e
1 Parent(s): 42b973c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -60,10 +60,10 @@ We follow the approach described in [Bilingual Adaptation of Monolingual Foundat
60
  ### Training data
61
  This model was continuously trained on 173B tokens, with the training data consisting of 20% English and 80% Japanese. The raw Japanese data was filtered using scripts from [llm-jp-corpus repository](https://github.com/llm-jp/llm-jp-corpus). The following Japanese datasets were included into the training data mixture:
62
 
63
- - [legacy-datasets/mc4](https://huggingface.co/datasets/legacy-datasets/mc4)
64
- - [range3/cc100-ja](https://huggingface.co/datasets/range3/cc100-ja)
65
- - [if001/oscar_2023_filtered](https://huggingface.co/datasets/if001/oscar_2023_filtered)
66
- - [dumps.wikimedia.org](https://dumps.wikimedia.org/)
67
  * Note this released model was trained exclusively on open-source datasets. We also trained models using proprietary domain-specific data, but there are no plans to release those models.
68
 
69
  ### Hyper-parameters
 
60
  ### Training data
61
  This model was continuously trained on 173B tokens, with the training data consisting of 20% English and 80% Japanese. The raw Japanese data was filtered using scripts from [llm-jp-corpus repository](https://github.com/llm-jp/llm-jp-corpus). The following Japanese datasets were included into the training data mixture:
62
 
63
+ * **[legacy-datasets/mc4](https://huggingface.co/datasets/legacy-datasets/mc4)**
64
+ * **[range3/cc100-ja](https://huggingface.co/datasets/range3/cc100-ja)**
65
+ * **[if001/oscar_2023_filtered](https://huggingface.co/datasets/if001/oscar_2023_filtered)**
66
+ * **[dumps.wikimedia.org](https://dumps.wikimedia.org/)**
67
  * Note this released model was trained exclusively on open-source datasets. We also trained models using proprietary domain-specific data, but there are no plans to release those models.
68
 
69
  ### Hyper-parameters