MarcosDib commited on
Commit
08ac5e2
1 Parent(s): 6270a7a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -16
README.md CHANGED
@@ -32,8 +32,8 @@ models were applied to improve the understanding of each sentence.
32
 
33
  ## According to the abstract,
34
 
35
- Compared to the 81% baseline accuracy rate based on available datasets and the 85% accuracy rate achieved using a
36
- Transformer-based approach, the Word2Vec-based approach improved the accuracy rate to 93%, according to
37
  ["Using transfer learning to classify long unstructured texts with small amounts of labeled data"](https://www.scitepress.org/Link.aspx?doi=10.5220/0011527700003318).
38
 
39
  ## Model description
@@ -60,20 +60,55 @@ redundancies in the analysis of the inputs.
60
 
61
  ## Model variations
62
 
63
- With the motivation to increase accuracy obtained with baseline implementation, was implemented a transfer learning
64
- strategy under the assumption that small data available for training was insufficient for adequate embedding training.
65
- In this context, was considered two approaches:
66
-
67
- - Pre-training word embeddings using similar datasets for text classification;
68
- - Using transformers and attention mechanisms (Longformer) to create contextualized embeddings.
69
-
70
- Templates using Word2Vec and Longformer also need to be loaded and their weights are as follows:
71
-
72
- Table 1: Templates using Word2Vec and Longformer
73
- | Tamplates | weights |
74
- |------------------------------|:-------:|
75
- | Longformer | 10.9GB |
76
- | Word2Vec | 56.1MB |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
  ## Intended uses
79
 
@@ -265,6 +300,21 @@ The results obtained surpassed those achieved in goal 6 and goal 9, with the bes
265
  in the longformer + CNN model. We can also observe that the models that achieved the best results were those
266
  that used the CNN network for deep learning.
267
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
268
  In addition, it was possible to notice that the model of longformer + SNN and longformer + LSTM were not able
269
  to learn. Perhaps the models need some adjustments, but each training attempt took between 5 and 8 hours, which
270
  made it impossible to try to adjust when other models were already showing promising results.
 
32
 
33
  ## According to the abstract,
34
 
35
+ "The research results serve as a successful case of artificial intelligence in a federal government application".
36
+ More details about the project, architecture model, training model and classifications process can be found in the article
37
  ["Using transfer learning to classify long unstructured texts with small amounts of labeled data"](https://www.scitepress.org/Link.aspx?doi=10.5220/0011527700003318).
38
 
39
  ## Model description
 
60
 
61
  ## Model variations
62
 
63
+ Table x below presents the results of several implementations with different architectures, highlighting the
64
+ accuracy, f1-score, recall and precision results obtained in the training of each network.
65
+
66
+ Table 1: Results of experiments
67
+ | Model | Accuracy | F1-score | Recall | Precision |
68
+ |------------------------|----------|----------|--------|-----------|
69
+ | Keras Embedding + SNN | 92.47 | 88.46 | 79.66 | 100.00 |
70
+ | Keras Embedding + DNN | 89.78 | 84.41 | 77.81 | 92.57 |
71
+ | Keras Embedding + CNN | 93.01 | 89.91 | 85.18 | 95.69 |
72
+ | Keras Embedding + LSTM | 93.01 | 88.94 | 83.32 | 95.54 |
73
+ | Word2Vec + SNN | 89.25 | 83.82 | 74.15 | 97.10 |
74
+ | Word2Vec + DNN | 90.32 | 86.52 | 85.18 | 88.70 |
75
+ | Word2Vec + CNN | 92.47 | 88.42 | 80.85 | 98.72 |
76
+ | Word2Vec + LSTM | 89.78 | 84.36 | 75.36 | 95.81 |
77
+ | Longformer + SNN | 61.29 | 0 | 0 | 0 |
78
+ | Longformer + DNN | 91.93 | 87.62 | 80.37 | 97.62 |
79
+ | Longformer + CNN | 94.09 | 90.69 | 83.41 | 100.00 |
80
+ | Longformer + LSTM | 61.29 | 0 | 0 | 0 |
81
+
82
+ Table 2 bellow shows the times required for training each epoch, the data validation execution time and the weight of the deep learning
83
+ model associated with each implementation.
84
+
85
+ Table 1: Results of experiments
86
+ | Model | Training time epoch(s) | Validation time (s) | Weight(MB) |
87
+ |------------------------|:-----------------------:|:-------------------:|:----------:|
88
+ | Keras Embedding + SNN | 100.00 | 0.2 | 0.7 | 1.8 |
89
+ | Keras Embedding + DNN | 92.57 | 1.0 | 1.4 | 7.6 |
90
+ | Keras Embedding + CNN | 95.69 | 0.4 | 1.1 | 3.2 |
91
+ | Keras Embedding + LSTM | 95.54 | 1.4 | 2.0 | 1.8 |
92
+ | Word2Vec + SNN | 97.10 | 1.4 | 1.2 | 9.6 |
93
+ | Word2Vec + DNN | 88.70 | 2.0 | 6.8 | 7.8 |
94
+ | Word2Vec + CNN | 98.72 | 1.9 | 3.4 | 4.7 |
95
+ | Word2Vec + LSTM | 95.81 | 2.6 | 14.3 | 1.2 |
96
+ | Longformer + SNN | 0 | 128.0 | 1.5 | 36.8 |
97
+ | Longformer + DNN | 97.62 | 81.0 | 8.4 | 12.7 |
98
+ | Longformer + CNN | 100.00 | 57.0 | 4.5 | 9.6 |
99
+ | Longformer + LSTM | 0 | 13.0 | 8.6 | 2.6 |
100
+
101
+ In addition, it is possible to notice that the model of Longformer + SNN and Longformer + LSTM were not able
102
+ to learn. Perhaps the models need some adjustments, however, each training attempt takes between 5 and 8 hours,
103
+ which made the attempt to adjust unfeasible in view of other models already showing promising results.
104
+
105
+ With Longformer the problems caused by the size of the model became more visible. First, it was necessary to
106
+ actively deallocate unused chunks of memory right after use so that the next steps could be loaded. Then, it
107
+ was necessary to use a CPU environment for training the networks because the weight of the model exceeded the
108
+ 16GB of video memory available on the P100 board, available in Colab during training. In this case, the high
109
+ RAM environment was used, which delivers 25GB of memory for use with the CPU, and this means a longer time
110
+ required for training since the GPU performs matrix operations faster then a CPU. These models were trained
111
+ 5x with 100 training epochs each.
112
 
113
  ## Intended uses
114
 
 
300
  in the longformer + CNN model. We can also observe that the models that achieved the best results were those
301
  that used the CNN network for deep learning.
302
 
303
+ With the motivation to increase accuracy obtained with baseline implementation, was implemented a transfer learning
304
+ strategy under the assumption that small data available for training was insufficient for adequate embedding training.
305
+ In this context, was considered two approaches:
306
+
307
+ - Pre-training word embeddings using similar datasets for text classification;
308
+ - Using transformers and attention mechanisms (Longformer) to create contextualized embeddings.
309
+
310
+ Templates using Word2Vec and Longformer also need to be loaded and their weights are as follows:
311
+
312
+ Table 1: Templates using Word2Vec and Longformer
313
+ | Tamplates | weights |
314
+ |+----------------------------+|:-------:|
315
+ | Longformer | 10.9GB |
316
+ | Word2Vec | 56.1MB |
317
+
318
  In addition, it was possible to notice that the model of longformer + SNN and longformer + LSTM were not able
319
  to learn. Perhaps the models need some adjustments, but each training attempt took between 5 and 8 hours, which
320
  made it impossible to try to adjust when other models were already showing promising results.