abdiharyadi
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ The dataset consists of 388 Indonesian fable stories.
|
|
28 |
These stories was gathered from [dongengceritarakyat.com](https://dongengceritarakyat.com/) at January 8, 2024.
|
29 |
The duplicated stories without any paraphrashing was removed, based on the value of cosine similarity of TF-IDF trigram words.
|
30 |
Furthermore, the remaining stories were cleaned manually for removing non-fable stories, incomplete stories (e.g. synopsis), some misused punctuations, and some typos.
|
31 |
-
|
32 |
|
33 |
The cleaned stories was splitted with 80:10:10 ratio, giving
|
34 |
- 310 stories for training,
|
|
|
28 |
These stories was gathered from [dongengceritarakyat.com](https://dongengceritarakyat.com/) at January 8, 2024.
|
29 |
The duplicated stories without any paraphrashing was removed, based on the value of cosine similarity of TF-IDF trigram words.
|
30 |
Furthermore, the remaining stories were cleaned manually for removing non-fable stories, incomplete stories (e.g. synopsis), some misused punctuations, and some typos.
|
31 |
+
If a mistake is found, the dataset will be modified as soon as possible.
|
32 |
|
33 |
The cleaned stories was splitted with 80:10:10 ratio, giving
|
34 |
- 310 stories for training,
|