File size: 11,006 Bytes
bdaf98c 1bae770 69177d6 a1a89f1 bdaf98c df006ca bdaf98c f10fbc2 bdaf98c 13d58f6 a1a89f1 f10fbc2 a1a89f1 f10fbc2 a1a89f1 13d58f6 3752382 4b32dec 3752382 bdaf98c 4b32dec bdaf98c 13d58f6 bdaf98c 4b32dec bdaf98c 3752382 bdaf98c c618345 f10fbc2 c618345 f10fbc2 c618345 f10fbc2 53975be f10fbc2 53975be f10fbc2 53975be c618345 f10fbc2 0c8d56b c618345 13d58f6 a1a89f1 13d58f6 f10fbc2 13d58f6 f10fbc2 c618345 bdaf98c 3752382 bdaf98c 4b32dec bdaf98c 13d58f6 4b32dec f10fbc2 ca631e1 bdaf98c ca631e1 bdaf98c 4b32dec bdaf98c f10fbc2 bdaf98c 3752382 f10fbc2 bdaf98c 4b32dec bdaf98c a1a89f1 bdaf98c 13d58f6 f10fbc2 13d58f6 f10fbc2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
---
library_name: transformers
language:
- en
base_model: microsoft/phi-2
pipeline_tag: text-generation
tags:
- medical
- pubmed
- clinical trials
- scientific literature
widget:
- text: "'###Unstruct:\nKawasaki disease (KD) is a systemic vasculitis that causes abnormalities in the coronary arteries. Interleukin (IL)-41 is a novel immunoregulatory cytokine involved in the pathogenesis of some inflammatory and immune-related diseases. However, the role of IL-41 in KD is unclear. The purpose of this study was to detect the expression of IL-41 in the plasma of children with KD and its relationship with the disease.\nA total of 44 children with KD and 37 healthy controls (HC) were recruited for this study. Plasma concentrations of IL-41 were determined by ELISA. Correlations between plasma IL-41 levels and KD-related clinical parameters were analyzed by Pearson correlation and multivariate linear regression analysis. Receiver operating characteristic curve analysis was used to assess the clinical value of IL-41 in the diagnosis of KD.\nOur results showed that plasma IL-41 levels were significantly elevated in children with KD compared with HC. Correlation analysis demonstrated that IL-41 levels were positively correlated with D-dimer and N-terminal pro-B-type natriuretic peptide, and negatively correlated with IgM, mean corpuscular hemoglobin concentration, total protein, albumin and pre-albumin. Multivariable linear regression analysis revealed that IgM and mean corpuscular hemoglobin concentrations were associated with IL-41. Receiver operating characteristic curve analysis showed that the area under the curve of IL-41 was 0.7101, with IL-41 providing 88.64 % sensitivity and 54.05 % specificity.\nOur study indicated that plasma IL-41 levels in children with KD were significantly higher than those in HC, and may provide a potential diagnostic biomarker for KD.\n###Struct:\n"
---
![](ft_sections.png)
A small language model designed for scientific research applications. Phi2 was fine tuned to analyzing randomized clinical trial abstracts and to classify sentences into four key sections: Background, Methods, Results, and Conclusion.
This model facilitates researchers in understanding and organizing key information from clinical studies.
## Model Details
The publication rate of Randomized Controlled Trials (RCTs) is consistently increasing,
with more than 1 million RCTs already published.
Approximately half of these publications are listed in PubMed,
posing a significant data-volume challenge for medical researchers seeking specific information.
When searching for prior studies, such as for writing systematic reviews,
researchers often skim through abstracts to quickly determine if the papers meet their criteria of interest.
This task is facilitated when abstracts are structured, meaning the text within an abstract is organized under semantic headings
like objective, method, result, and conclusion.
However, more than half of the RCT abstracts published are unstructured, complicating the rapid identification of relevant information.
This model classifies each sentence of an abstract into a corresponding 'canonical 'section, greatly accelerating the process of locating the desired information.
This classification not only aids researchers but may also benefit other downstream applications, including automatic text summarization, information extraction, and information retrieval.
- **Developed by: Salvatore Saporito
- **Language(s) (NLP):** English
- **Finetuned from model:** https://huggingface.co/microsoft/phi-2
### Model Sources [optional]
- **Repository:** Coming soon
## Uses
Automatic identification of sections in (randomized clinical trial) abstracts.
## How to Get Started with the Model
Prompt Format:
'''
###Unstruct:
{abstract}
###Struct:
'''
Usage:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig
from peft import PeftModel
#Load base model weight
tokenizer_name = "microsoft/phi-2"
basemodel_name = "microsoft/phi-2"
model_id = "SaborDay/Phi2_RCT1M-ft-heading"
#Load base model weight & tokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name,trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(basemodel_name, device_map='auto', trust_remote_code=True)
#Load adapter
fine_tuned_model = PeftModel.from_pretrained(model, model_id)
# Tokenize
inputs = tokenizer(prompt,
return_tensors="pt",
return_attention_mask=True,
padding=False,
truncation=True)
#Run inference
outputs = fine_tuned_model.generate(**inputs, max_length=1000)
# Decode output
text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(text)
Usage (with quantization):
bnb_config = BitsAndBytesConfig(load_in_4bit=True,
bnb_4bit_quant_type='nf4',
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True)
[...]
model = AutoModelForCausalLM.from_pretrained(..., quantization_config=bnb_config)
[...]
fine_tuned_model = PeftModel.from_pretrained(... , quantization_config=bnb_config)
Example:
Application on unseen data
PROMPT: '###Unstruct:\nKawasaki disease (KD) is a systemic vasculitis that causes abnormalities in the coronary arteries.
Interleukin (IL)-41 is a novel immunoregulatory cytokine involved in the pathogenesis of some inflammatory and immune-related diseases.
However, the role of IL-41 in KD is unclear.
The purpose of this study was to detect the expression of IL-41 in the plasma of children with KD and its relationship with the disease.
A total of 44 children with KD and 37 healthy controls (HC) were recruited for this study. Plasma concentrations of IL-41 were determined by ELISA.
Correlations between plasma IL-41 levels and KD-related clinical parameters were analyzed by Pearson correlation and multivariate linear regression analysis.
Receiver operating characteristic curve analysis was used to assess the clinical value of IL-41 in the diagnosis of KD.
Our results showed that plasma IL-41 levels were significantly elevated in children with KD compared with HC.
Correlation analysis demonstrated that IL-41 levels were positively correlated with D-dimer and N-terminal pro-B-type natriuretic peptide, and negatively correlated with IgM, mean corpuscular hemoglobin concentration, total protein, albumin and pre-albumin. Multivariable linear regression analysis revealed that IgM and mean corpuscular hemoglobin concentrations were associated with IL-41. Receiver operating characteristic curve analysis showed that the area under the curve of IL-41 was 0.7101, with IL-41 providing 88.64 % sensitivity and 54.05 % specificity.
Our study indicated that plasma IL-41 levels in children with KD were significantly higher than those in HC, and may provide a potential diagnostic biomarker for KD.
###Struct:'
Answer Phi2_RCT1M-ft-heading:
BACKGROUND: Kawasaki disease (KD) is a systemic vasculitis that causes abnormalities in the coronary arteries.
Interleukin (IL)-41 is a novel immunoregulatory cytokine involved in the pathogenesis of some inflammatory and immune-related diseases.
However, the role of IL-41 in KD is unclear. T
he purpose of this study was to detect the expression of IL-41 in the plasma of children with KD and its relationship with the disease.
METHODS: A total of 44 children with KD and 37 healthy controls (HC) were recruited for this study.
Plasma concentrations of IL-41 were determined by ELISA.
Correlations between plasma IL-41 levels and KD-related clinical parameters were analyzed by Pearson correlation and multivariate linear regression analysis.
Receiver operating characteristic curve analysis was used to assess the clinical value of IL-41 in the diagnosis of KD.
RESULTS: Our results showed that plasma IL-41 levels were significantly elevated in children with KD compared with HC.
Correlation analysis demonstrated that IL-41 levels were positively correlated with D-dimer and N-terminal pro-B-type natriuretic peptide, and negatively correlated with IgM, mean corpuscular hemoglobin concentration, total protein, albumin and pre-albumin. Multivariable linear regression analysis revealed that IgM and mean corpuscular hemoglobin concentrations were associated with IL-41. Receiver operating characteristic curve analysis showed that the area under the curve of IL-41 was 0.7101, with IL-41 providing 88.64 % sensitivity and 54.05 % specificity.
CONCLUSIONS: Our study indicated that plasma IL-41 levels in children with KD were significantly higher than those in HC, and may provide a potential diagnostic biomarker for KD.
## Training Details
### Training Data
50k randomly sampled randomized clinical trial abstracts with date of pubblication within [1970-2023].
Abstracts were retrieved from MEDLINE using Biopython.
### Training Procedure
Generation of (unstructured, structured) pairs for structured abstracts.
Generation of dedicated prompt for Causal_LM modelling.
#### Training Hyperparameters
bnb_config = BitsAndBytesConfig(load_in_4bit=True,
bnb_4bit_quant_type='nf4',
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True)
#### Training Run metrics
[Run details on WaB](https://wandb.ai/salvatore-saporito-phd/huggingface/runs/5fcnxthk?nw=nwusersalvatoresaporitophd)
## Evaluation
The model was evaluated over a subset of previously considered abstracts [20k RCT](https://github.com/Franck-Dernoncourt/pubmed-rct/tree/master/PubMed_20k_RCT).
Each individual abstract within evaluation sample was verified not to be present in training set using corresponding PMID.
### Testing Data, Factors & Metrics
#### Testing Data
10k randomly sampled RCT abstract within period [1970-2023]
#### Metrics
[WIP]
## Technical Specifications [optional]
### Model Architecture and Objective
LoraConfig(
r=16,
lora_alpha=32,
target_modules=['q_proj','k_proj','v_proj','dense','fc1','fc2'],
bias="none",
lora_dropout=0.05,
task_type="CAUSAL_LM",
)
### Compute Infrastructure
#### Hardware
1 x RTX4090 - 24 GB
#### Software
pip install torch einops transformers bitsandbytes accelerate peft
## Model Card Contact
Salvatore Saporito - [email protected]
## References
https://arxiv.org/abs/1710.06071
https://arxiv.org/abs/2106.09685
https://arxiv.org/pdf/2309.05463
|