unb-lamfo-nlp-mcti
/

NLP-Classification-MCTI

English

Clsssification

science

Model card Files Files and versions Community

MarcosDib commited on Dec 12, 2022

Commit

2522575

•

1 Parent(s): a854db5

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -83,7 +83,7 @@ Other 24 smaller models are released afterward.
 The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
 | Model | #params | Language |
-|:----------------------------:|:-------:|:--------:|
 | [`mcti-base-uncased`] | 110M | English |
 | [`mcti-large-uncased`] | 340M | English | sub
 | [`mcti-base-cased`] | 110M | English |
@@ -91,7 +91,7 @@ The detailed release history can be found on the [here](https://huggingface.co/u
 | [`-base-multilingual-cased`] | 110M | Multiple |
 | Dataset | Compatibility to base* |
-|:------------------------------------:|:----------------------:|
 | Labeled MCTI | 100% |
 | Full MCTI | 100% |
 | BBC News Articles | 56.77% |
@@ -202,13 +202,13 @@ The following assumptions were considered:
 - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
 - Pre-processing was investigated for the classification goal.
-From the Database obtained in Meta 4, stored in the project's [GitHub](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](colab.research.google.com)
-to implement the [pre-processing code](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
 Several Python packages were used to develop the preprocessing code:
 | Objective | Package |
-|:------------------------------------------------------:|:------------:|
 | Resolve contractions and slang usage in text | [contractions](https://pypi.org/project/contractions) |
 | Natural Language Processing | [nltk](https://pypi.org/project/nltk) |
 | Others data manipulations and calculations included in Python 3.10: io, json, math, re (regular expressions), shutil, time, unicodedata; | [numpy](https://pypi.org/project/numpy) |
@@ -224,7 +224,7 @@ As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip
 bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
 | Base | Textos originais |
-|:------:|:------------------------------------------------------------:|
 | xp1 | Expandir Contrações |
 | xp2 | Expandir Contrações + Transformar texto em minúsculo |
 | xp3 | Expandir Contrações + Remover Pontuação |
@@ -233,7 +233,7 @@ bases, derived from the base of goal 4, with the application of the methods show
 | xp6 | xp4 + Lematização |
 | xp7 | xp4 + Stemização + Remoção de StopWords |
 | xp8 | ap4 + Lematização + Remoção de StopWords |
- Table 2 – Pre-processing methods evaluated
 ### Pretraining

 The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
 | Model | #params | Language |
+|------------------------------|:-------:|:--------:|
 | [`mcti-base-uncased`] | 110M | English |
 | [`mcti-large-uncased`] | 340M | English | sub
 | [`mcti-base-cased`] | 110M | English |
 | [`-base-multilingual-cased`] | 110M | Multiple |
 | Dataset | Compatibility to base* |
+|--------------------------------------|:----------------------:|
 | Labeled MCTI | 100% |
 | Full MCTI | 100% |
 | BBC News Articles | 56.77% |
 - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
 - Pre-processing was investigated for the classification goal.
+From the Database obtained in Meta 4, stored in the project's [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](https://colab.research.google.com)
+to implement the [pre-processing code](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
 Several Python packages were used to develop the preprocessing code:
 | Objective | Package |
+|--------------------------------------------------------|--------------|
 | Resolve contractions and slang usage in text | [contractions](https://pypi.org/project/contractions) |
 | Natural Language Processing | [nltk](https://pypi.org/project/nltk) |
 | Others data manipulations and calculations included in Python 3.10: io, json, math, re (regular expressions), shutil, time, unicodedata; | [numpy](https://pypi.org/project/numpy) |
 bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
 | Base | Textos originais |
+|--------|--------------------------------------------------------------|
 | xp1 | Expandir Contrações |
 | xp2 | Expandir Contrações + Transformar texto em minúsculo |
 | xp3 | Expandir Contrações + Remover Pontuação |
 | xp6 | xp4 + Lematização |
 | xp7 | xp4 + Stemização + Remoção de StopWords |
 | xp8 | ap4 + Lematização + Remoção de StopWords |
+Table 2 – Pre-processing methods evaluated
 ### Pretraining