Viona commited on
Commit
16fd179
1 Parent(s): 12eaccf

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ tags:
4
+ - agriculture-domain
5
+ - agriculture
6
+ widget:
7
+ - text: "[MASK] agriculture provides one of the most promising areas for innovation in green and blue infrastructure in cities."
8
+ ---
9
+ # BERT for Agriculture Domain
10
+ A BERT-based language model further pre-trained from the checkpoint of [SciBERT](https://huggingface.co/allenai/scibert_scivocab_uncased).
11
+ The dataset gathered is a balance between scientific and general works in agriculture domain and encompassing knowledge from different areas of agriculture research and practical knowledge.
12
+
13
+ The corpus contains 1.3 million paragraphs from National Agricultural Library (NAL) from the US Gov. and 4.2 million paragraphs from books and common literature from the **Agriculture Domain**.
14
+
15
+ The self-supervised learning approach of MLM was used to train the model.
16
+ - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run
17
+ the entire masked sentence through the model and has to predict the masked words. This is different from traditional
18
+ recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
19
+ GPT internally masks the future tokens. It allows the model to learn a bidirectional representation of the
20
+ sentence.
21
+ ```python
22
+ from transformers import pipeline
23
+ fill_mask = pipeline(
24
+ "fill-mask",
25
+ model="recobo/chemical-bert-uncased",
26
+ tokenizer="recobo/chemical-bert-uncased"
27
+ )
28
+ fill_mask("we create [MASK]")
29
+ ```