BecomeAllan commited on
Commit
273581b
1 Parent(s): 53e3dbf

update links to nlp-mcti-ppf

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -170,8 +170,8 @@ The following assumptions were considered:
170
  - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
171
  - Pre-processing was investigated for the classification goal.
172
 
173
- From the Database obtained in Goal 4, stored in the project's [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](https://colab.research.google.com)
174
- to implement the [preprocessing code](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
175
 
176
  Several Python packages were used to develop the preprocessing code:
177
 
@@ -189,7 +189,7 @@ Table 3: Python packages used
189
  | Translation from multiple languages to English | [translators](https://pypi.org/project/translators) |
190
 
191
 
192
- As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento), in the pre-processing, code was created to build and evaluate 8 (eight) different
193
  bases, derived from the base of goal 4, with the application of the methods shown in table 4.
194
 
195
  Table 4: Preprocessing methods evaluated
@@ -234,7 +234,7 @@ was the computational cost required to train the vector representation models (w
234
  document-embedding). The training time is so close that it did not have such a large weight for the analysis.
235
 
236
  As the last step, a spreadsheet was generated for the model (xp8) with the fields opo_pre and opo_pre_tkn, containing the
237
- preprocessed text in sentence format and tokens, respectively. This [database](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/oportunidades_final_pre_processado.xlsx) was made
238
  available on the project's GitHub with the inclusion of columns opo_pre (text) and opo_pre_tkn (tokenized).
239
 
240
  ### Pretraining
 
170
  - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
171
  - Pre-processing was investigated for the classification goal.
172
 
173
+ From the Database obtained in Goal 4, stored in the project's [GitHub](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Data/scrapy/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](https://colab.research.google.com)
174
+ to implement the [preprocessing code](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Pre_Processing/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
175
 
176
  Several Python packages were used to develop the preprocessing code:
177
 
 
189
  | Translation from multiple languages to English | [translators](https://pypi.org/project/translators) |
190
 
191
 
192
+ As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Pre_Processing/MCTI_PPF_Pr%C3%A9_processamento.ipynb), in the pre-processing, code was created to build and evaluate 8 (eight) different
193
  bases, derived from the base of goal 4, with the application of the methods shown in table 4.
194
 
195
  Table 4: Preprocessing methods evaluated
 
234
  document-embedding). The training time is so close that it did not have such a large weight for the analysis.
235
 
236
  As the last step, a spreadsheet was generated for the model (xp8) with the fields opo_pre and opo_pre_tkn, containing the
237
+ preprocessed text in sentence format and tokens, respectively. This [database](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Pre_Processing/oportunidades_final_pre_processado.xlsx) was made
238
  available on the project's GitHub with the inclusion of columns opo_pre (text) and opo_pre_tkn (tokenized).
239
 
240
  ### Pretraining