library_name: transformers
language:
- en
base_model: microsoft/phi-2
pipeline_tag: text-generation
Model Card for Model ID
Model is a powerful language tool designed for scientific research. It specializes in analyzing clinical trial abstracts and sorts sentences into four key sections: Background, Methods, Results, and Conclusion. This makes it easier and faster for researchers to understand and organize important information from clinical studies.
Model Details
- **Developed by: Salvatore Saporito
- Language(s) (NLP): English
- Finetuned from model: https://huggingface.co/microsoft/phi-2
Model Sources [optional]
- Repository: Coming soon
Uses
Automatic identification of sections in (clinical trial) abstracts.
How to Get Started with the Model
Prompt Format:
'''
###Unstruct:
{abstract}
###Struct:
'''
Training Details
Training Data
50k randomly sampled randomized clinical trial abstracts with date of pubblication within [1970-2023]. Abstracts were retrieved from MEDLINE using Biopython.
Training Procedure
Generation of (unstructured, structured) pairs for structured abstracts. Generation of dedicated prompt for Causal_LM modelling.
Training Hyperparameters
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True)
Evaluation
Testing Data, Factors & Metrics
Testing Data
10k randomly sampled RCT abstract within period [1970-2023]
Metrics
Results
Summary
Technical Specifications [optional]
Model Architecture and Objective
LoraConfig(
r=16,
lora_alpha=32,
target_modules=[
'q_proj',
'k_proj',
'v_proj',
'dense',
'fc1',
'fc2',
],
bias="none",
lora_dropout=0.05,
task_type="CAUSAL_LM",
)
Compute Infrastructure
Hardware
1 x RTX4090 - 24 GB
Software
torch einops transformers bitsandbytes accelerate peft