--- license: apache-2.0 language: - en base_model: - mistralai/Mistral-7B-v0.1 pipeline_tag: text-generation tags: - clinical trial - foundation model --- # Model Card for Panacea-7B-Chat The Panacea-7B-Chat is a foundation model for clinical trial search, summarization, design, and recruitment. It was equipped with clinical knowledge by being trained on 793,279 clinical trial design documents worldwide and 1,113,207 clinical study papers. It shows superior performances than various open-sourced LLMs and medical LLMs on clinical trial tasks. For full details of this model please read our [paper](https://arxiv.org/abs/2407.11007). ## Model Training Panacea is trained from [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1). The training of Panacea consists of an alignment step and an instruction-tuning step. * Alignment step: continued pre-training on a large collection of trial documents and trial-related scientific papers. This step adapts Panacea to the vocabulary commonly used in clinical trials. * Instruction-tuning step: further enables Panacea to comprehend the user explanation of the task definition and the output requirement. Load the model in the following way (same as Mistral): ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = 'linjc16/Panacea-7B-Chat' model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_id) ``` ## Citation If you find our paper or models helpful, please consider cite as follows: ```bibtex @article{lin2024panacea, title={Panacea: A foundation model for clinical trial search, summarization, design, and recruitment}, author={Lin, Jiacheng and Xu, Hanwen and Wang, Zifeng and Wang, Sheng and Sun, Jimeng}, journal={arXiv preprint arXiv:2407.11007}, year={2024} } ```