cxllin
/

StableMed-3b

Question Answering

text-generation

Model card Files Files and versions Community

cxllin commited on Dec 16, 2023

Commit

46cfc8b

•

1 Parent(s): 002e591

Create README.md

updated model card

Files changed (1) hide show

README.md +49 -0

README.md ADDED Viewed

	@@ -0,0 +1,49 @@

+---
+license: apache-2.0
+datasets:
+- cxllin/medinstructv2
+language:
+- en
+library_name: transformers
+pipeline_tag: question-answering
+tags:
+- medical
+---
+`StableMed` is a 3 billion parameter decoder-only language model fine tuned on 18k rows of medical questions over 1 epoch.
+## Usage
+Get started generating text with `StableMed` by using the following code snippet:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("cxllin/StableMed-3b")
+model = AutoModelForCausalLM.from_pretrained(
+  "stabilityai/stablelm-3b-4e1t",
+  trust_remote_code=True,
+  torch_dtype="auto",
+)
+model.cuda()
+inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to("cuda")
+tokens = model.generate(
+  **inputs,
+  max_new_tokens=64,
+  temperature=0.75,
+  top_p=0.95,
+  do_sample=True,
+)
+print(tokenizer.decode(tokens[0], skip_special_tokens=True))
+```
+### Model Architecture
+The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2023](https://arxiv.org/abs/2307.09288)) architecture with the following modifications:
+| Parameters     | Hidden Size | Layers | Heads | Sequence Length |
+|----------------|-------------|--------|-------|-----------------|
+| 2,795,443,200  | 2560        | 32     | 32    | 4096            |
+* **Position Embeddings**: Rotary Position Embeddings ([Su et al., 2021](https://arxiv.org/abs/2104.09864)) applied to the first 25% of head embedding dimensions for improved throughput following [Black et al. (2022)](https://arxiv.org/pdf/2204.06745.pdf).
+* **Normalization**: LayerNorm ([Ba et al., 2016](https://arxiv.org/abs/1607.06450)) with learned bias terms as opposed to RMSNorm ([Zhang & Sennrich, 2019](https://arxiv.org/abs/1910.07467)).
+* **Tokenizer**: GPT-NeoX ([Black et al., 2022](https://arxiv.org/abs/2204.06745)).