metadata
license: apache-2.0
language:
- en
base_model:
- mistralai/Mistral-7B-v0.1
pipeline_tag: text-generation
tags:
- clinical trial
- foundation model
Model Card for Panacea-7B-Chat
The Panacea-7B-Chat is a foundation model for clinical trial search, summarization, design, and recruitment. It was equipped with clinical knowledge by being trained on 793,279 clinical trial design documents worldwide and 1,113,207 clinical study papers. It shows superior performances than various open-sourced LLMs and medical LLMs on clinical trial tasks.
For full details of this model please read our paper.
Model Training
Panacea is trained from Mistral-7B-v0.1. The training of Panacea consists of an alignment step and an instruction-tuning step.
- Alignment step: continued pre-training on a large collection of trial documents and trial-related scientific papers. This step adapts Panacea to the vocabulary commonly used in clinical trials.
- Instruction-tuning step: further enables Panacea to comprehend the user explanation of the task definition and the output requirement.
Load the model in the following way (same as Mistral):
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = 'linjc16/Panacea-7B-Chat'
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
Citation
If you find our paper or models helpful, please consider cite as follows:
@article{lin2024panacea,
title={Panacea: A foundation model for clinical trial search, summarization, design, and recruitment},
author={Lin, Jiacheng and Xu, Hanwen and Wang, Zifeng and Wang, Sheng and Sun, Jimeng},
journal={arXiv preprint arXiv:2407.11007},
year={2024}
}