Update README.md
Browse files
README.md
CHANGED
@@ -23,31 +23,18 @@ thumbnail: https://github.com/Marcosdib/S2Query/Classification_Architecture_mode
|
|
23 |
|
24 |
Disclaimer: The Brazilian Ministry of Science, Technology, and Innovation (MCTI) has partially supported this project.
|
25 |
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
|
|
31 |
|
32 |
## According to the abstract,
|
33 |
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
different machine learning strategies to classify a small amount of long, unstructured, and uneven data to find a proper method
|
38 |
-
with good performance. The collected data includes texts of financing opportunities the international R&D funding organizations
|
39 |
-
provided on theirwebsites. The main goal is to find international R&D funding eligible for Brazilian researchers, sponsored by
|
40 |
-
the Ministry of Science, Technology and Innovation. We use pre-training and word embedding solutions to learn the relationship
|
41 |
-
of the words from other datasets with considerable similarity and larger scale. Then, using the acquired features, based on the
|
42 |
-
available dataset from MCTI, we apply transfer learning plus deep learning models to improve the comprehension of each sentence.
|
43 |
-
Compared to the baseline accuracy rate of 81%, based on the available datasets, and the 85% accuracy rate achieved through a
|
44 |
-
Transformer-based approach, the Word2Vec-based approach improved the accuracy rate to 88%. The research results serve as
|
45 |
-
asuccessful case of artificial intelligence in a federal government application.
|
46 |
-
|
47 |
-
This model focus on a more specific problem, creating a Research Financing Products Portfolio (FPP) outside ofthe Union budget,
|
48 |
-
supported by the Brazilian Ministry of Science, Technology, and Innovation (MCTI). It was introduced in ["Using transfer learning to classify long unstructured texts with small amounts of labeled data"](https://www.scitepress.org/Link.aspx?doi=10.5220/0011527700003318) and first released in
|
49 |
-
[this repository](https://huggingface.co/unb-lamfo-nlp-mcti). This model is uncased: it does not make a difference between english
|
50 |
-
and English.
|
51 |
|
52 |
## Model description
|
53 |
|
@@ -159,6 +146,9 @@ output = model(encoded_input)
|
|
159 |
|
160 |
### Limitations and bias
|
161 |
|
|
|
|
|
|
|
162 |
Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
|
163 |
predictions:
|
164 |
|
|
|
23 |
|
24 |
Disclaimer: The Brazilian Ministry of Science, Technology, and Innovation (MCTI) has partially supported this project.
|
25 |
|
26 |
+
The model [NLP MCTI Classification Multi](https://huggingface.co/spaces/unb-lamfo-nlp-mcti/NLP-W2V-CNN-Multi) is part of the project [Research Financing Product Portfolio (FPP)](https://huggingface.co/unb-lamfo-nlp-mcti) focuses
|
27 |
+
on the task of Text Classification and explores different machine learning strategies to classify a small amount
|
28 |
+
of long, unstructured, and uneven data to find a proper method with good performance. Pre-training and word embedding
|
29 |
+
solutions were used to learn word relationships from other datasets with considerable similarity and larger scale.
|
30 |
+
Then, using the acquired resources, based on the dataset available in the MCTI, transfer learning plus deep learning
|
31 |
+
models were applied to improve the understanding of each sentence.
|
32 |
|
33 |
## According to the abstract,
|
34 |
|
35 |
+
Compared to the 81% baseline accuracy rate based on available datasets and the 85% accuracy rate achieved using a
|
36 |
+
Transformer-based approach, the Word2Vec-based approach improved the accuracy rate to 93%, according to
|
37 |
+
["Using transfer learning to classify long unstructured texts with small amounts of labeled data"](https://www.scitepress.org/Link.aspx?doi=10.5220/0011527700003318).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
## Model description
|
40 |
|
|
|
146 |
|
147 |
### Limitations and bias
|
148 |
|
149 |
+
This model is uncased: it does not make a difference between english
|
150 |
+
and English.
|
151 |
+
|
152 |
Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
|
153 |
predictions:
|
154 |
|