Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ This tokenizer is specifically designed for working with **Réunion Creole**, a
|
|
11 |
## Features
|
12 |
|
13 |
- Built using the **BPE (Byte Pair Encoding)** model.
|
14 |
-
- Trained on "LA RIME, Mo i akorde dann bal zakor".
|
15 |
- Supports special tokens for common NLP tasks:
|
16 |
- `[CLS]`: Start-of-sequence token for classification tasks.
|
17 |
- `[SEP]`: Separator token for multi-segment inputs.
|
|
|
11 |
## Features
|
12 |
|
13 |
- Built using the **BPE (Byte Pair Encoding)** model.
|
14 |
+
- Trained on "LA RIME, Mo i akorde dann bal zakor", a free-access book.
|
15 |
- Supports special tokens for common NLP tasks:
|
16 |
- `[CLS]`: Start-of-sequence token for classification tasks.
|
17 |
- `[SEP]`: Separator token for multi-segment inputs.
|