unb-lamfo-nlp-mcti
/

NLP-Classification-MCTI

English

Clsssification

science

Model card Files Files and versions Community

MarcosDib commited on Dec 12, 2022

Commit

bdba148

•

1 Parent(s): c6cd2eb

Update README.md

Browse files

Files changed (1) hide show

README.md +12 -22

README.md CHANGED Viewed

@@ -23,31 +23,18 @@ thumbnail: https://github.com/Marcosdib/S2Query/Classification_Architecture_mode
 Disclaimer: The Brazilian Ministry of Science, Technology, and Innovation (MCTI) has partially supported this project.
-This project focus on a specific problem, creating a Research Financing Products Portfolio (FPP) outside
-of the Union budget, supported by the Brazilian Ministry of Science, Technology, and Innovation (MCTI), The problem
-description and conceptual model of FPP/MCTI are shown in Figure 1.
-![Model](https://github.com/Marcosdib/S2Query/Classification_Architecture_model.png)
 ## According to the abstract,
-Text classification is a traditional problem in Natural Language Processing (NLP). Most of the state-of-the-art implementations
-require high-quality, voluminous, labeled data. Pre- trained models on large corpora have shown beneficial for text classification
-and other NLP tasks, but they can only take a limited amount of symbols as input. This is a real case study that explores
-different machine learning strategies to classify a small amount of long, unstructured, and uneven data to find a proper method
-with good performance. The collected data includes texts of financing opportunities the international R&D funding organizations
-provided on theirwebsites. The main goal is to find international R&D funding eligible for Brazilian researchers, sponsored by
-the Ministry of Science, Technology and Innovation. We use pre-training and word embedding solutions to learn the relationship
-of the words from other datasets with considerable similarity and larger scale. Then, using the acquired features, based on the
-available dataset from MCTI, we apply transfer learning plus deep learning models to improve the comprehension of each sentence.
-Compared to the baseline accuracy rate of 81%, based on the available datasets, and the 85% accuracy rate achieved through a
-Transformer-based approach, the Word2Vec-based approach improved the accuracy rate to 88%. The research results serve as
-asuccessful case of artificial intelligence in a federal government application.
-This model focus on a more specific problem, creating a Research Financing Products Portfolio (FPP) outside ofthe Union budget,
-supported by the Brazilian Ministry of Science, Technology, and Innovation (MCTI). It was introduced in ["Using transfer learning to classify long unstructured texts with small amounts of labeled data"](https://www.scitepress.org/Link.aspx?doi=10.5220/0011527700003318) and first released in
-[this repository](https://huggingface.co/unb-lamfo-nlp-mcti). This model is uncased: it does not make a difference between english
-and English.
 ## Model description
@@ -159,6 +146,9 @@ output = model(encoded_input)
 ### Limitations and bias
 Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
 predictions:

 Disclaimer: The Brazilian Ministry of Science, Technology, and Innovation (MCTI) has partially supported this project.
+The model [NLP MCTI Classification Multi](https://huggingface.co/spaces/unb-lamfo-nlp-mcti/NLP-W2V-CNN-Multi) is part of the project [Research Financing Product Portfolio (FPP)](https://huggingface.co/unb-lamfo-nlp-mcti) focuses
+on the task of Text Classification and explores different machine learning strategies to classify a small amount
+of long, unstructured, and uneven data to find a proper method with good performance. Pre-training and word embedding
+solutions were used to learn word relationships from other datasets with considerable similarity and larger scale.
+Then, using the acquired resources, based on the dataset available in the MCTI, transfer learning plus deep learning
+models were applied to improve the understanding of each sentence.
 ## According to the abstract,
+Compared to the 81% baseline accuracy rate based on available datasets and the 85% accuracy rate achieved using a
+Transformer-based approach, the Word2Vec-based approach improved the accuracy rate to 93%, according to
+["Using transfer learning to classify long unstructured texts with small amounts of labeled data"](https://www.scitepress.org/Link.aspx?doi=10.5220/0011527700003318).
 ## Model description
 ### Limitations and bias
+This model is uncased: it does not make a difference between english
+and English.
 Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
 predictions: