nevmenandr
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ The texts for the training corpus are taken from two datasets published in the [
|
|
20 |
|
21 |
Казакова, Елена, 2023, "[Забытые романы русских писателей из фондов Пушкинского Дома (1857–1917)](https://dataverse.pushdom.ru/dataset.xhtml?persistentId=doi:10.31860/openlit-2023.12-C007)", https://doi.org/10.31860/openlit-2023.12-C007, Репозиторий открытых данных по русской литературе и фольклору, V2, UNF:6:DCGrSrMDXXtoRfHBDWfS4A== [fileUNF]
|
22 |
|
23 |
-
Only texts published after
|
24 |
|
25 |
The texts are marked up using the Russian version of the [booknlp](https://github.com/booknlp/booknlp) library, which highlighted the characters of the fictional works.
|
26 |
|
|
|
20 |
|
21 |
Казакова, Елена, 2023, "[Забытые романы русских писателей из фондов Пушкинского Дома (1857–1917)](https://dataverse.pushdom.ru/dataset.xhtml?persistentId=doi:10.31860/openlit-2023.12-C007)", https://doi.org/10.31860/openlit-2023.12-C007, Репозиторий открытых данных по русской литературе и фольклору, V2, UNF:6:DCGrSrMDXXtoRfHBDWfS4A== [fileUNF]
|
22 |
|
23 |
+
Only texts published after 1845 (the era of realism) remain in the corpus. Texts presented in old orthography have been converted to modern orthography with the help of a [package](https://pypi.org/project/prereform2modern/).
|
24 |
|
25 |
The texts are marked up using the Russian version of the [booknlp](https://github.com/booknlp/booknlp) library, which highlighted the characters of the fictional works.
|
26 |
|