vukosi commited on
Commit
b15142e
·
1 Parent(s): 00e38c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -7
README.md CHANGED
@@ -49,6 +49,19 @@ tokenizer = RobertaTokenizer.from_pretrained('dsfsi/PuoBERTaJW300')
49
 
50
  ## Downstream Performance
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ### MasakhaPOS
53
 
54
  Performance of models on the MasakhaPOS downstream task.
@@ -62,8 +75,10 @@ Performance of models on the MasakhaPOS downstream task.
62
  | AfroXLMR-large | 83.0 |
63
  | **Monolingual Models** | |
64
  | NCHLT TSN RoBERTa | 82.3 |
65
- | PuoBERTa | 83.4 |
66
- | PuoBERTa+JW300 | **84.1** |
 
 
67
 
68
  ### MasakhaNER
69
 
@@ -77,16 +92,20 @@ Performance of models on the MasakhaNER downstream task.
77
  | AfroXLMR-large | 89.4 |
78
  | **Monolingual Models** | |
79
  | NCHLT TSN RoBERTa | 74.2 |
80
- | PuoBERTa | 78.2 |
81
- | PuoBERTa+JW300 | **80.2** |
 
 
 
 
82
 
83
- ## Dataset
84
 
85
- We used the PuoData dataset, a rich source of Setswana text, ensuring that our model is well-trained and culturally attuned.\\
86
 
87
  ## Citation Information
88
 
89
- Bibtex Refrence
90
 
91
  ```
92
  @inproceedings{marivate2023puoberta,
 
49
 
50
  ## Downstream Performance
51
 
52
+ ### Daily News Dikgang
53
+
54
+ Learn more about the dataset in the [Dataset Folder](daily-news-dikgang)
55
+
56
+ | **Model** | **5-fold Cross Validation F1** | **Test F1** |
57
+ |-----------------------------|--------------------------------------|-------------------|
58
+ | Logistic Regression + TFIDF | 60.1 | 56.2 |
59
+ | NCHLT TSN RoBERTa | 64.7 | 60.3 |
60
+ | PuoBERTa | **63.8** | **62.9** |
61
+ | PuoBERTaJW300 | 66.2 | 65.4 |
62
+
63
+ Downstream News Categorisation model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-News](https://huggingface.co/dsfsi/PuoBERTa-News)
64
+
65
  ### MasakhaPOS
66
 
67
  Performance of models on the MasakhaPOS downstream task.
 
75
  | AfroXLMR-large | 83.0 |
76
  | **Monolingual Models** | |
77
  | NCHLT TSN RoBERTa | 82.3 |
78
+ | PuoBERTa | **83.4** |
79
+ | PuoBERTa+JW300 | 84.1 |
80
+
81
+ Downstream POS model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-POS](https://huggingface.co/dsfsi/PuoBERTa-POS)
82
 
83
  ### MasakhaNER
84
 
 
92
  | AfroXLMR-large | 89.4 |
93
  | **Monolingual Models** | |
94
  | NCHLT TSN RoBERTa | 74.2 |
95
+ | PuoBERTa | **78.2** |
96
+ | PuoBERTa+JW300 | 80.2 |
97
+
98
+ Downstream NER model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-NER](https://huggingface.co/dsfsi/PuoBERTa-NER)
99
+
100
+ ## Pre-Training Dataset
101
 
102
+ We used the PuoData dataset, a rich source of Setswana text, ensuring that our model is well-trained and culturally attuned.
103
 
104
+ [Github](https://github.com/dsfsi/PuoData), 🤗 [https://huggingface.co/datasets/dsfsi/PuoData](https://huggingface.co/datasets/dsfsi/PuoData)
105
 
106
  ## Citation Information
107
 
108
+ Bibtex Reference
109
 
110
  ```
111
  @inproceedings{marivate2023puoberta,