Update README.md
Browse files
README.md
CHANGED
@@ -82,7 +82,7 @@ Other 24 smaller models are released afterward.
|
|
82 |
|
83 |
The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
|
84 |
|
85 |
-
#### Table 1
|
86 |
| Model | #params | Language |
|
87 |
|------------------------------|:-------:|:--------:|
|
88 |
| [`mcti-base-uncased`] | 110M | English |
|
@@ -91,6 +91,7 @@ The detailed release history can be found on the [here](https://huggingface.co/u
|
|
91 |
| [`mcti-large-cased`] | 110M | Chinese |
|
92 |
| [`-base-multilingual-cased`] | 110M | Multiple |
|
93 |
|
|
|
94 |
| Dataset | Compatibility to base* |
|
95 |
|--------------------------------------|:----------------------:|
|
96 |
| Labeled MCTI | 100% |
|
@@ -208,6 +209,7 @@ to implement the [pre-processing code](https://github.com/mcti-sefip/mcti-sefip-
|
|
208 |
|
209 |
Several Python packages were used to develop the preprocessing code:
|
210 |
|
|
|
211 |
| Objective | Package |
|
212 |
|--------------------------------------------------------|--------------|
|
213 |
| Resolve contractions and slang usage in text | [contractions](https://pypi.org/project/contractions) |
|
@@ -224,7 +226,7 @@ Several Python packages were used to develop the preprocessing code:
|
|
224 |
As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento), in the pre-processing, code was created to build and evaluate 8 (eight) different
|
225 |
bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
|
226 |
|
227 |
-
Table 4: Preprocessing methods evaluated
|
228 |
| id | Experiments |
|
229 |
|--------|------------------------------------------------------------------------|
|
230 |
| Base | Original Texts |
|
@@ -239,8 +241,7 @@ Table 4: Preprocessing methods evaluated
|
|
239 |
|
240 |
|
241 |
|
242 |
-
|
243 |
-
Table 5: Results obtained in Preprocessing
|
244 |
| id | Experiment | acurácia | f1-score | recall | precision | Média(s) | N_tokens | max_lenght |
|
245 |
|--------|------------------------------------------------------------------------|----------|----------|--------|-----------|----------|----------|------------|
|
246 |
| Base | Original Texts | 89,78% | 84,20% | 79,09% | 90,95% | 417,772 | 23788 | 5636 |
|
@@ -271,8 +272,7 @@ data in a supervised manner. The new coupled model can be seen in Figure 5 under
|
|
271 |
obtained results with related metrics. With this implementation, we achieved new levels of accuracy with 86% for the CNN
|
272 |
architecture and 88% for the LSTM architecture.
|
273 |
|
274 |
-
|
275 |
-
Table 6: Results from Pre-trained WE + ML models
|
276 |
| ML Model | Accuracy | F1 Score | Precision | Recall |
|
277 |
|:--------:|:---------:|:---------:|:---------:|:---------:|
|
278 |
| NN | 0.8269 | 0.8545 | 0.8392 | 0.8712 |
|
|
|
82 |
|
83 |
The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
|
84 |
|
85 |
+
#### Table 1:
|
86 |
| Model | #params | Language |
|
87 |
|------------------------------|:-------:|:--------:|
|
88 |
| [`mcti-base-uncased`] | 110M | English |
|
|
|
91 |
| [`mcti-large-cased`] | 110M | Chinese |
|
92 |
| [`-base-multilingual-cased`] | 110M | Multiple |
|
93 |
|
94 |
+
#### Table 2:
|
95 |
| Dataset | Compatibility to base* |
|
96 |
|--------------------------------------|:----------------------:|
|
97 |
| Labeled MCTI | 100% |
|
|
|
209 |
|
210 |
Several Python packages were used to develop the preprocessing code:
|
211 |
|
212 |
+
#### Table 3: Python packages used
|
213 |
| Objective | Package |
|
214 |
|--------------------------------------------------------|--------------|
|
215 |
| Resolve contractions and slang usage in text | [contractions](https://pypi.org/project/contractions) |
|
|
|
226 |
As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento), in the pre-processing, code was created to build and evaluate 8 (eight) different
|
227 |
bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
|
228 |
|
229 |
+
#### Table 4: Preprocessing methods evaluated
|
230 |
| id | Experiments |
|
231 |
|--------|------------------------------------------------------------------------|
|
232 |
| Base | Original Texts |
|
|
|
241 |
|
242 |
|
243 |
|
244 |
+
#### Table 5: Results obtained in Preprocessing
|
|
|
245 |
| id | Experiment | acurácia | f1-score | recall | precision | Média(s) | N_tokens | max_lenght |
|
246 |
|--------|------------------------------------------------------------------------|----------|----------|--------|-----------|----------|----------|------------|
|
247 |
| Base | Original Texts | 89,78% | 84,20% | 79,09% | 90,95% | 417,772 | 23788 | 5636 |
|
|
|
272 |
obtained results with related metrics. With this implementation, we achieved new levels of accuracy with 86% for the CNN
|
273 |
architecture and 88% for the LSTM architecture.
|
274 |
|
275 |
+
#### Table 6: Results from Pre-trained WE + ML models
|
|
|
276 |
| ML Model | Accuracy | F1 Score | Precision | Recall |
|
277 |
|:--------:|:---------:|:---------:|:---------:|:---------:|
|
278 |
| NN | 0.8269 | 0.8545 | 0.8392 | 0.8712 |
|