cxllin
/

StableMed-3b

Question Answering

text-generation

Model card Files Files and versions Community

StableMed-3b / README.md

cxllin's picture

Create README.md

46cfc8b 11 months ago

|

history blame contribute delete

1.84 kB

	---
	license: apache-2.0
	datasets:
	- cxllin/medinstructv2
	language:
	- en
	library_name: transformers
	pipeline_tag: question-answering
	tags:
	- medical
	---


	`StableMed` is a 3 billion parameter decoder-only language model fine tuned on 18k rows of medical questions over 1 epoch.
	## Usage

	Get started generating text with `StableMed` by using the following code snippet:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	tokenizer = AutoTokenizer.from_pretrained("cxllin/StableMed-3b")
	model = AutoModelForCausalLM.from_pretrained(
	"stabilityai/stablelm-3b-4e1t",
	trust_remote_code=True,
	torch_dtype="auto",
	)
	model.cuda()
	inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to("cuda")
	tokens = model.generate(
	**inputs,
	max_new_tokens=64,
	temperature=0.75,
	top_p=0.95,
	do_sample=True,
	)
	print(tokenizer.decode(tokens[0], skip_special_tokens=True))
	```

	### Model Architecture

	The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2023](https://arxiv.org/abs/2307.09288)) architecture with the following modifications:

	\| Parameters \| Hidden Size \| Layers \| Heads \| Sequence Length \|
	\|----------------\|-------------\|--------\|-------\|-----------------\|
	\| 2,795,443,200 \| 2560 \| 32 \| 32 \| 4096 \|

	* Position Embeddings: Rotary Position Embeddings ([Su et al., 2021](https://arxiv.org/abs/2104.09864)) applied to the first 25% of head embedding dimensions for improved throughput following [Black et al. (2022)](https://arxiv.org/pdf/2204.06745.pdf).
	* Normalization: LayerNorm ([Ba et al., 2016](https://arxiv.org/abs/1607.06450)) with learned bias terms as opposed to RMSNorm ([Zhang & Sennrich, 2019](https://arxiv.org/abs/1910.07467)).
	* Tokenizer: GPT-NeoX ([Black et al., 2022](https://arxiv.org/abs/2204.06745)).