MarcosDib commited on
Commit
2522575
1 Parent(s): a854db5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -83,7 +83,7 @@ Other 24 smaller models are released afterward.
83
  The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
84
 
85
  | Model | #params | Language |
86
- |:----------------------------:|:-------:|:--------:|
87
  | [`mcti-base-uncased`] | 110M | English |
88
  | [`mcti-large-uncased`] | 340M | English | sub
89
  | [`mcti-base-cased`] | 110M | English |
@@ -91,7 +91,7 @@ The detailed release history can be found on the [here](https://huggingface.co/u
91
  | [`-base-multilingual-cased`] | 110M | Multiple |
92
 
93
  | Dataset | Compatibility to base* |
94
- |:------------------------------------:|:----------------------:|
95
  | Labeled MCTI | 100% |
96
  | Full MCTI | 100% |
97
  | BBC News Articles | 56.77% |
@@ -202,13 +202,13 @@ The following assumptions were considered:
202
  - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
203
  - Pre-processing was investigated for the classification goal.
204
 
205
- From the Database obtained in Meta 4, stored in the project's [GitHub](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](colab.research.google.com)
206
- to implement the [pre-processing code](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
207
 
208
  Several Python packages were used to develop the preprocessing code:
209
 
210
  | Objective | Package |
211
- |:------------------------------------------------------:|:------------:|
212
  | Resolve contractions and slang usage in text | [contractions](https://pypi.org/project/contractions) |
213
  | Natural Language Processing | [nltk](https://pypi.org/project/nltk) |
214
  | Others data manipulations and calculations included in Python 3.10: io, json, math, re (regular expressions), shutil, time, unicodedata; | [numpy](https://pypi.org/project/numpy) |
@@ -224,7 +224,7 @@ As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip
224
  bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
225
 
226
  | Base | Textos originais |
227
- |:------:|:------------------------------------------------------------:|
228
  | xp1 | Expandir Contrações |
229
  | xp2 | Expandir Contrações + Transformar texto em minúsculo |
230
  | xp3 | Expandir Contrações + Remover Pontuação |
@@ -233,7 +233,7 @@ bases, derived from the base of goal 4, with the application of the methods show
233
  | xp6 | xp4 + Lematização |
234
  | xp7 | xp4 + Stemização + Remoção de StopWords |
235
  | xp8 | ap4 + Lematização + Remoção de StopWords |
236
- Table 2 – Pre-processing methods evaluated
237
 
238
  ### Pretraining
239
 
 
83
  The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
84
 
85
  | Model | #params | Language |
86
+ |------------------------------|:-------:|:--------:|
87
  | [`mcti-base-uncased`] | 110M | English |
88
  | [`mcti-large-uncased`] | 340M | English | sub
89
  | [`mcti-base-cased`] | 110M | English |
 
91
  | [`-base-multilingual-cased`] | 110M | Multiple |
92
 
93
  | Dataset | Compatibility to base* |
94
+ |--------------------------------------|:----------------------:|
95
  | Labeled MCTI | 100% |
96
  | Full MCTI | 100% |
97
  | BBC News Articles | 56.77% |
 
202
  - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
203
  - Pre-processing was investigated for the classification goal.
204
 
205
+ From the Database obtained in Meta 4, stored in the project's [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](https://colab.research.google.com)
206
+ to implement the [pre-processing code](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
207
 
208
  Several Python packages were used to develop the preprocessing code:
209
 
210
  | Objective | Package |
211
+ |--------------------------------------------------------|--------------|
212
  | Resolve contractions and slang usage in text | [contractions](https://pypi.org/project/contractions) |
213
  | Natural Language Processing | [nltk](https://pypi.org/project/nltk) |
214
  | Others data manipulations and calculations included in Python 3.10: io, json, math, re (regular expressions), shutil, time, unicodedata; | [numpy](https://pypi.org/project/numpy) |
 
224
  bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
225
 
226
  | Base | Textos originais |
227
+ |--------|--------------------------------------------------------------|
228
  | xp1 | Expandir Contrações |
229
  | xp2 | Expandir Contrações + Transformar texto em minúsculo |
230
  | xp3 | Expandir Contrações + Remover Pontuação |
 
233
  | xp6 | xp4 + Lematização |
234
  | xp7 | xp4 + Stemização + Remoção de StopWords |
235
  | xp8 | ap4 + Lematização + Remoção de StopWords |
236
+ Table 2 – Pre-processing methods evaluated
237
 
238
  ### Pretraining
239