Pclanglais
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,10 @@ Estienne was trained on 2,000 example of manually annotated texts, excerpted at
|
|
6 |
|
7 |
Given the diversity of the corpus, Estienne should work out on diverse document formats in European languages.
|
8 |
|
9 |
-
|
|
|
|
|
|
|
10 |
|
11 |
Estienne supports the following segmentations:
|
12 |
* **Text**
|
@@ -21,4 +24,4 @@ Estienne supports the following segmentations:
|
|
21 |
* **Date** - statement of date and time, common in letters and newspaper articles.
|
22 |
* **Keyword** - list of keywords, especially common in scientific publications.
|
23 |
|
24 |
-
|
|
|
6 |
|
7 |
Given the diversity of the corpus, Estienne should work out on diverse document formats in European languages.
|
8 |
|
9 |
+
The model is named in reference to the humanist Henri Estienne who introduced many practices of text segmentation still in use in scholarly edition today.
|
10 |
+
|
11 |
+
## Use
|
12 |
+
As Deberta remove newline by default and has no support for it in the tokenizer, they should be replaced by pilcrows (¶).
|
13 |
|
14 |
Estienne supports the following segmentations:
|
15 |
* **Text**
|
|
|
24 |
* **Date** - statement of date and time, common in letters and newspaper articles.
|
25 |
* **Keyword** - list of keywords, especially common in scientific publications.
|
26 |
|
27 |
+
## Example
|