ThuanPhong commited on
Commit
b4b6358
·
verified ·
1 Parent(s): 7ac99d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md CHANGED
@@ -7,3 +7,35 @@ widget:
7
  - text: "Cà phê được trồng nhiều ở khu vực Tây <mask> của Việt Nam."
8
  example_title: "Example 2"
9
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - text: "Cà phê được trồng nhiều ở khu vực Tây <mask> của Việt Nam."
8
  example_title: "Example 2"
9
  ---
10
+
11
+
12
+ # <a name="introduction"></a> CafeBERT: A Pre-Trained Language Model for Vietnamese (NAACL-2024 Findings)
13
+
14
+ The pre-trained CafeBERT model is the state-of-the-art language model for Vietnamese *(Cafe or coffee is a popular drink every morning in Vietnam)*:
15
+
16
+ CafeBERT is a large-scale multilingual language model with strong support for Vietnamese. The model is based on XLM-Roberta (the state-of-the-art multilingual language model) and is enhanced with a large Vietnamese corpus with many domains: Wikipedia, newspapers... CafeBERT has outstanding performance on the VLUE benchmark and other tasks, like: machine reading comprehension, text classification, natural language inference, part-of-speech tagging...
17
+
18
+ The general architecture and experimental results of PhoBERT can be found in our paper:
19
+
20
+ Please **CITE** our paper when CafeBERT is used to help produce published results or is incorporated into other software.
21
+
22
+ **Installation**
23
+
24
+ Install `transformers` and `SentencePiece` packages:
25
+
26
+ pip install transformers
27
+ pip install SentencePiece
28
+
29
+ **Example usage**
30
+ ```python
31
+ from transformers import AutoModel, AutoTokenizer
32
+ import torch
33
+
34
+ model= AutoModel.from_pretrained('uitnlp/CafeBERT')
35
+ tokenizer = AutoTokenizer.from_pretrained('uitnlp/CafeBERT')
36
+
37
+ encoding = tokenizer('Cà phê được trồng nhiều ở khu vực Tây Nguyên của Việt Nam.', return_tensors='pt')
38
+
39
+ with torch.no_grad():
40
+ output = model(**encoding)
41
+ ```