m3hrdadfi commited on
Commit
b8a4df0
·
1 Parent(s): 4ae70a5

Add tags, restructure info

Browse files
Files changed (1) hide show
  1. README.md +3 -36
README.md CHANGED
@@ -17,37 +17,6 @@ All the models (downstream tasks) are uncased and trained with whole word maskin
17
  This task aims to extract named entities in the text, such as names and label with appropriate `NER` classes such as locations, organizations, etc. The datasets used for this task contain sentences that are marked with `IOB` format. In this format, tokens that are not part of an entity are tagged as `”O”` the `”B”`tag corresponds to the first word of an object, and the `”I”` tag corresponds to the rest of the terms of the same entity. Both `”B”` and `”I”` tags are followed by a hyphen (or underscore), followed by the entity category. Therefore, the NER task is a multi-class token classification problem that labels the tokens upon being fed a raw text. There are two primary datasets used in Persian NER, `ARMAN`, and `PEYMA`. In ParsBERT, we prepared ner for both datasets as well as a combination of both datasets.
18
 
19
 
20
-
21
- ### PEYMA
22
-
23
- PEYMA dataset includes 7,145 sentences with a total of 302,530 tokens from which 41,148 tokens are tagged with seven different classes.
24
-
25
- 1. Organization
26
- 2. Money
27
- 3. Location
28
- 4. Date
29
- 5. Time
30
- 6. Person
31
- 7. Percent
32
-
33
-
34
- | Label | # |
35
- |:------------:|:-----:|
36
- | Organization | 16964 |
37
- | Money | 2037 |
38
- | Location | 8782 |
39
- | Date | 4259 |
40
- | Time | 732 |
41
- | Person | 7675 |
42
- | Percent | 699 |
43
-
44
-
45
-
46
- **Download**
47
- You can download the dataset from [here](http://nsurl.org/tasks/task-7-named-entity-recognition-ner-for-farsi/)
48
-
49
- ---
50
-
51
  ### ARMAN
52
 
53
  ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.
@@ -80,11 +49,9 @@ You can download the dataset from [here](https://github.com/HaniehP/PersianNER)
80
 
81
  The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures.
82
 
83
- | Dataset | ParsBERT | MorphoBERT | Beheshti-NER | LSTM-CRF | Rule-Based CRF | BiLSTM-CRF |
84
- |:---------------:|:--------:|:----------:|:--------------:|:----------:|:----------------:|:------------:|
85
- | ARMAN + PEYMA | 95.13* | - | - | - | - | - |
86
- | PEYMA | 98.79* | - | 90.59 | - | 84.00 | - |
87
- | ARMAN | 93.10* | 89.9 | 84.03 | 86.55 | - | 77.45 |
88
 
89
 
90
  ## How to use :hugs:
 
17
  This task aims to extract named entities in the text, such as names and label with appropriate `NER` classes such as locations, organizations, etc. The datasets used for this task contain sentences that are marked with `IOB` format. In this format, tokens that are not part of an entity are tagged as `”O”` the `”B”`tag corresponds to the first word of an object, and the `”I”` tag corresponds to the rest of the terms of the same entity. Both `”B”` and `”I”` tags are followed by a hyphen (or underscore), followed by the entity category. Therefore, the NER task is a multi-class token classification problem that labels the tokens upon being fed a raw text. There are two primary datasets used in Persian NER, `ARMAN`, and `PEYMA`. In ParsBERT, we prepared ner for both datasets as well as a combination of both datasets.
18
 
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ### ARMAN
21
 
22
  ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.
 
49
 
50
  The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures.
51
 
52
+ | Dataset | ParsBERT | MorphoBERT | Beheshti-NER | LSTM-CRF | Rule-Based CRF | BiLSTM-CRF |
53
+ |---------|----------|------------|--------------|----------|----------------|------------|
54
+ | ARMAN | 93.10* | 89.9 | 84.03 | 86.55 | - | 77.45 |
 
 
55
 
56
 
57
  ## How to use :hugs: