Update README.md
Browse files
README.md
CHANGED
@@ -39,19 +39,23 @@ and English.
|
|
39 |
|
40 |
## Model description
|
41 |
|
42 |
-
This Automatic Text Summarizarion (ATS) Model was developed to be applied to the Research Financing Products
|
43 |
-
of the Brazilian Ministry of Science, Technology and Innovation. It was produced in parallel with the writing of a
|
44 |
-
Literature Review paper, in which there is a discussion concerning many summarization methods, datasets, and evaluators
|
45 |
-
as a brief overview of the nature of the task itself and the state-of-the-art of its implementation.
|
46 |
-
|
47 |
-
The input of the model can be either a single text or a csv file containing multiple texts (in the English language) and its output
|
48 |
-
and their evaluation metrics. As an optional (although recommended) input, the model accepts gold-standard summaries
|
49 |
-
i.e., human written (or extracted) summaries of the texts which are considered to be good representations of their contents.
|
50 |
-
like ROUGE, which in its many variations is the most used to perform the task, require gold-standard summaries as inputs. There are,
|
51 |
-
Evaluation Methods which do not deppend on the existence of a golden-summary (e.g. the cosine similarity method, the Kullback Leibler
|
52 |
-
and this is why an evaluation can be made even when only the text is taken as an input to the model.
|
53 |
-
|
54 |
-
|
|
|
|
|
|
|
|
|
55 |
|
56 |
|
57 |
|
|
|
39 |
|
40 |
## Model description
|
41 |
|
42 |
+
This Automatic Text Summarizarion (ATS) Model was developed in the Python language to be applied to the Research Financing Products
|
43 |
+
Portfolio (FPP) of the Brazilian Ministry of Science, Technology and Innovation. It was produced in parallel with the writing of a
|
44 |
+
Sistematic Literature Review paper, in which there is a discussion concerning many summarization methods, datasets, and evaluators
|
45 |
+
as well as a brief overview of the nature of the task itself and the state-of-the-art of its implementation.
|
46 |
+
|
47 |
+
The input of the model can be either a single text, a dataframe or a csv file containing multiple texts (in the English language) and its output
|
48 |
+
are the summarized texts and their evaluation metrics. As an optional (although recommended) input, the model accepts gold-standard summaries
|
49 |
+
for the texts, i.e., human written (or extracted) summaries of the texts which are considered to be good representations of their contents.
|
50 |
+
Evaluators like ROUGE, which in its many variations is the most used to perform the task, require gold-standard summaries as inputs. There are,
|
51 |
+
however, Evaluation Methods which do not deppend on the existence of a golden-summary (e.g. the cosine similarity method, the Kullback Leibler
|
52 |
+
Divergence method) and this is why an evaluation can be made even when only the text is taken as an input to the model.
|
53 |
+
|
54 |
+
The text output is produced by a chosen method of ATS which can be extractive (built with the most relevant sentences of the source document)
|
55 |
+
or abstractive (written from scratch in an abstractive manner). The latter is achieved by means of transformers, and the ones present in the
|
56 |
+
model are the already existing and vastly applied BART-Large CNN, Pegasus-XSUM and mT5 Multilingual XLSUM. The extractive methods are taken from
|
57 |
+
the Sumy Python Library and include SumyRandom, SumyLuhn, SumyLsa, SumyLexRank, SumyTextRank, SumySumBasic, SumyKL and SumyReduction. Each of the
|
58 |
+
methods used for text summarization will be described indvidually in the following sections.
|
59 |
|
60 |
|
61 |
|