recobo
/

agriculture-bert-uncased

agriculture-domain

Inference Endpoints

Model card Files Files and versions Community

Viona commited on Oct 8, 2021

Commit

16fd179

•

1 Parent(s): 12eaccf

Create README.md

Files changed (1) hide show

README.md +29 -0

README.md ADDED Viewed

	@@ -0,0 +1,29 @@

+---
+language: "en"
+tags:
+- agriculture-domain
+- agriculture
+widget:
+- text: "[MASK] agriculture provides one of the most promising areas for innovation in green and blue infrastructure in cities."
+---
+# BERT for Agriculture Domain
+A BERT-based language model further pre-trained from the checkpoint of [SciBERT](https://huggingface.co/allenai/scibert_scivocab_uncased).
+The dataset gathered is a balance between scientific and general works in agriculture domain and encompassing knowledge from different areas of agriculture research and practical knowledge.
+The corpus contains 1.3 million paragraphs from National Agricultural Library (NAL) from the US Gov. and 4.2 million paragraphs from books and common literature from the **Agriculture Domain**.
+The self-supervised learning approach of MLM was used to train the model.
+- Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run
+  the entire masked sentence through the model and has to predict the masked words. This is different from traditional
+  recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
+  GPT internally masks the future tokens. It allows the model to learn a bidirectional representation of the
+  sentence.
+```python
+from transformers import pipeline
+fill_mask = pipeline(
+    "fill-mask",
+    model="recobo/chemical-bert-uncased",
+    tokenizer="recobo/chemical-bert-uncased"
+)
+fill_mask("we create [MASK]")
+```