unb-lamfo-nlp-mcti
/

NLP-Classification-MCTI

English

Clsssification

science

Model card Files Files and versions Community

MarcosDib commited on Dec 12, 2022

Commit

0d2996b

•

1 Parent(s): 628a769

Update README.md

Browse files

Files changed (1) hide show

README.md +13 -50

README.md CHANGED Viewed

@@ -90,12 +90,12 @@ The detailed release history can be found on the [here](https://huggingface.co/u
 | [`mcti-large-cased`] | 110M | Chinese |
 | [`-base-multilingual-cased`] | 110M | Multiple |
-| Dataset | Compatibility to base* |
-|------------------------------|------------------------|
-| Labeled MCTI | 100% |
-| Full MCTI | 100% |
-| BBC News Articles | 56.77% |
-| New unlabeled MCTI | 75.26% |
 ## Intended uses
@@ -121,18 +121,6 @@ You can use this model directly with a pipeline for masked language modeling:
  'score': 0.1073106899857521,
  'token': 4827,
  'token_str': 'fashion'},
- {'sequence': "[CLS] hello i'm a role model. [SEP]",
- 'score': 0.08774490654468536,
- 'token': 2535,
- 'token_str': 'role'},
- {'sequence': "[CLS] hello i'm a new model. [SEP]",
- 'score': 0.05338378623127937,
- 'token': 2047,
- 'token_str': 'new'},
- {'sequence': "[CLS] hello i'm a super model. [SEP]",
- 'score': 0.04667217284440994,
- 'token': 3565,
- 'token_str': 'super'},
  {'sequence': "[CLS] hello i'm a fine model. [SEP]",
  'score': 0.027095865458250046,
  'token': 2986,
@@ -175,18 +163,6 @@ predictions:
  'score': 0.09747550636529922,
  'token': 10533,
  'token_str': 'carpenter'},
- {'sequence': '[CLS] the man worked as a waiter. [SEP]',
- 'score': 0.0523831807076931,
- 'token': 15610,
- 'token_str': 'waiter'},
- {'sequence': '[CLS] the man worked as a barber. [SEP]',
- 'score': 0.04962705448269844,
- 'token': 13362,
- 'token_str': 'barber'},
- {'sequence': '[CLS] the man worked as a mechanic. [SEP]',
- 'score': 0.03788609802722931,
- 'token': 15893,
- 'token_str': 'mechanic'},
  {'sequence': '[CLS] the man worked as a salesman. [SEP]',
  'score': 0.037680890411138535,
  'token': 18968,
@@ -198,18 +174,6 @@ predictions:
  'score': 0.21981462836265564,
  'token': 6821,
  'token_str': 'nurse'},
- {'sequence': '[CLS] the woman worked as a waitress. [SEP]',
- 'score': 0.1597415804862976,
- 'token': 13877,
- 'token_str': 'waitress'},
- {'sequence': '[CLS] the woman worked as a maid. [SEP]',
- 'score': 0.1154729500412941,
- 'token': 10850,
- 'token_str': 'maid'},
- {'sequence': '[CLS] the woman worked as a prostitute. [SEP]',
- 'score': 0.037968918681144714,
- 'token': 19215,
- 'token_str': 'prostitute'},
  {'sequence': '[CLS] the woman worked as a cook. [SEP]',
  'score': 0.03042375110089779,
  'token': 5660,
@@ -233,14 +197,14 @@ Pre-processing was used to standardize the texts for the English language, reduc
 optimize the training of the models.
 The following assumptions were considered:
-• The Data Entry base is obtained from the result of goal 4.
-• Labeling (Goal 4) is considered true for accuracy measurement purposes;
-• Preprocessing experiments compare accuracy in a shallow neural network (SNN);
-• Pre-processing was investigated for the classification goal.
-From the Database obtained in Meta 4, stored in the project's [GitHub](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps- desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](colab.research.google.com)
 to implement the [pre-processing code](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-
-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
 Several Python packages were used to develop the preprocessing code:
@@ -257,8 +221,7 @@ Several Python packages were used to develop the preprocessing code:
 | Translation from multiple languages to English | [translators](https://pypi.org/project/translators) |
-As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-
-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), in the pre-processing, code was created to build and evaluate 8 (eight) different
 bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
 | Base | Textos originais |

 | [`mcti-large-cased`] | 110M | Chinese |
 | [`-base-multilingual-cased`] | 110M | Multiple |
+| Dataset | Compatibility to base* |
+|--------------------------------------|------------------------|
+| Labeled MCTI | 100% |
+| Full MCTI | 100% |
+| BBC News Articles | 56.77% |
+| New unlabeled MCTI | 75.26% |
 ## Intended uses
  'score': 0.1073106899857521,
  'token': 4827,
  'token_str': 'fashion'},
  {'sequence': "[CLS] hello i'm a fine model. [SEP]",
  'score': 0.027095865458250046,
  'token': 2986,
  'score': 0.09747550636529922,
  'token': 10533,
  'token_str': 'carpenter'},
  {'sequence': '[CLS] the man worked as a salesman. [SEP]',
  'score': 0.037680890411138535,
  'token': 18968,
  'score': 0.21981462836265564,
  'token': 6821,
  'token_str': 'nurse'},
  {'sequence': '[CLS] the woman worked as a cook. [SEP]',
  'score': 0.03042375110089779,
  'token': 5660,
 optimize the training of the models.
 The following assumptions were considered:
+- The Data Entry base is obtained from the result of goal 4.
+- Labeling (Goal 4) is considered true for accuracy measurement purposes;
+- Preprocessing experiments compare accuracy in a shallow neural network (SNN);
+- Pre-processing was investigated for the classification goal.
+From the Database obtained in Meta 4, stored in the project's [GitHub](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL), a Notebook was developed in [Google Colab](colab.research.google.com)
 to implement the [pre-processing code](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-
+processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento), which also can be found on the project's GitHub.
 Several Python packages were used to develop the preprocessing code:
 | Translation from multiple languages to English | [translators](https://pypi.org/project/translators) |
+As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento), in the pre-processing, code was created to build and evaluate 8 (eight) different
 bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
 | Base | Textos originais |