Jeronymous
commited on
Commit
·
162f8b7
1
Parent(s):
0aeb68b
Add links to dataset and code
Browse files
README.md
CHANGED
@@ -139,7 +139,7 @@ prompt = """\
|
|
139 |
|
140 |
### Training Data
|
141 |
|
142 |
-
The training dataset
|
143 |
|
144 |
Claire-Mistral-7B-0.1 was tuned from Mistral-7B-v0.1 on the following data distribution:
|
145 |
|
@@ -147,10 +147,10 @@ Claire-Mistral-7B-0.1 was tuned from Mistral-7B-v0.1 on the following data distr
|
|
147 |
|-------------------------------|------------|------------------------------|-----------------------------------------------------|
|
148 |
| Parliamentary Proceedings | 135M | 35% | Assemblée Nationale |
|
149 |
| Theatre | 16M | 18% | Théâtre Classique, Théâtre Gratuit |
|
150 |
-
| Interviews | 6.4M | 29% | TCOF, CFPP, CFPB, ACSYNT, PFC, Valibel (ORFEO), ESLO|
|
151 |
| Free Conversations | 2.2M | 10% | CRFP (ORFEO), OFROM (ORFEO), CID, Rhapsodie, ParisStories, PFC, CLAPI, C-ORAL-ROM (ORFEO), LinTO, ESLO |
|
152 |
| Meetings | 1.2M | 5% | SUMM-RE, LinTO, Réunions de travail (ORFEO) |
|
153 |
-
| Debates | 402k | <2% |
|
154 |
| Assistance | 159k | <1% | Fleuron (ORFEO), Accueil UBS, OTG, ESLO |
|
155 |
| Presentation, Formal Address | 86k | <0.5% | Valibel (ORFEO), LinTO, ESLO |
|
156 |
|
@@ -165,7 +165,7 @@ While the model has been trained and evaluated only on French dialogues, it may
|
|
165 |
|
166 |
### Training Procedure
|
167 |
|
168 |
-
The training code
|
169 |
|
170 |
Claire-Mistral-7B-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
|
171 |
See [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) for more details.
|
|
|
139 |
|
140 |
### Training Data
|
141 |
|
142 |
+
The training dataset is available at [OpenLLM-France/Claire-Dialogue-French-0.1](https://huggingface.co/datasets/OpenLLM-France/Claire-Dialogue-French-0.1).
|
143 |
|
144 |
Claire-Mistral-7B-0.1 was tuned from Mistral-7B-v0.1 on the following data distribution:
|
145 |
|
|
|
147 |
|-------------------------------|------------|------------------------------|-----------------------------------------------------|
|
148 |
| Parliamentary Proceedings | 135M | 35% | Assemblée Nationale |
|
149 |
| Theatre | 16M | 18% | Théâtre Classique, Théâtre Gratuit |
|
150 |
+
| Interviews | 6.4M | 29% | TCOF, CFPP, CFPB (ORFEO), ACSYNT, PFC, Valibel (ORFEO), ESLO|
|
151 |
| Free Conversations | 2.2M | 10% | CRFP (ORFEO), OFROM (ORFEO), CID, Rhapsodie, ParisStories, PFC, CLAPI, C-ORAL-ROM (ORFEO), LinTO, ESLO |
|
152 |
| Meetings | 1.2M | 5% | SUMM-RE, LinTO, Réunions de travail (ORFEO) |
|
153 |
+
| Debates | 402k | <2% | FREDSum, ESLO |
|
154 |
| Assistance | 159k | <1% | Fleuron (ORFEO), Accueil UBS, OTG, ESLO |
|
155 |
| Presentation, Formal Address | 86k | <0.5% | Valibel (ORFEO), LinTO, ESLO |
|
156 |
|
|
|
165 |
|
166 |
### Training Procedure
|
167 |
|
168 |
+
The training code is available at [https://github.com/OpenLLM-France/Lit-Claire](https://github.com/OpenLLM-France/Lit-Claire).
|
169 |
|
170 |
Claire-Mistral-7B-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
|
171 |
See [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) for more details.
|