aadelucia commited on
Commit
269f2cf
·
1 Parent(s): 54b3395

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md CHANGED
@@ -1,3 +1,59 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ # Bernice
6
+
7
+ Bernice is a multilingual pre-trained encoder exclusively for Twitter data.
8
+ The model was released with the EMNLP 2022 paper *Bernice: A Multilingual Pre-trained Encoder for Twitter* by Alexandra DeLucia, Shijie Wu, Aaron Mueller, Carlos Aguirre, Mark Dredze, and Philip Resnik.
9
+
10
+ This model card will contain more information *soon*. Please reach out to Alexandra DeLucia (aadelucia at jhu.edu) or open an issue if there are questions.
11
+
12
+ # Model description
13
+ TBD
14
+
15
+ ## Training data
16
+ TBD
17
+
18
+ ## Training procedure
19
+ TBD
20
+
21
+ ## Evaluation results
22
+ TBD
23
+
24
+ # How to use
25
+ You can use this model for tweet representation. To use with HuggingFace PyTorch interface:
26
+
27
+ ```python
28
+ from transformers import AutoTokenizer, AutoModel
29
+ import re
30
+
31
+ # Load model
32
+ model = AutoModel("bernice")
33
+ tokenizer = AutoTokenizer.from_pretrained("bernice", model_max_length=128)
34
+
35
+ # Data
36
+ raw_tweets = [
37
+ "So, Nintendo and Illimination's upcoming animated #SuperMarioBrosMovie is reportedly titled 'The Super Mario Bros. Movie'. Alrighty. :)",
38
+ "AMLO se vio muy indignado porque propusieron al presidente de Ucrania para el premio nobel de la paz. ¿Qué no hay otros que luchen por la paz? ¿Acaso se quería proponer él?"
39
+ ]
40
+
41
+ # Pre-process tweets for tokenizer
42
+ URL_RE = re.compile(r"https?:\/\/[\w\.\/\?\=\d&#%_:/-]+")
43
+ HANDLE_RE = re.compile(r"@\w+")
44
+ tweets = []
45
+ for t in raw_tweets:
46
+ t = HANDLE_RE.sub("@USER", t)
47
+ t = URL_RE.sub("HTTPURL", t)
48
+ tweets.append(t)
49
+
50
+ with torch.no_grad():
51
+ embeddings = model(tweets)
52
+ ```
53
+
54
+
55
+ # Limitations and bias
56
+ TBD
57
+
58
+ ## BibTeX entry and citation info
59
+ TBD