SaborDay's picture
Update README.md
a1a89f1 verified
|
raw
history blame
9.13 kB
metadata
library_name: transformers
language:
  - en
base_model: microsoft/phi-2
pipeline_tag: text-generation
widget:
  - text: >
      '###Unstruct:

      Kawasaki disease (KD) is a systemic vasculitis that causes abnormalities
      in the coronary arteries. Interleukin (IL)-41 is a novel immunoregulatory
      cytokine involved in the pathogenesis of some inflammatory and
      immune-related diseases. However, the role of IL-41 in KD is unclear. The
      purpose of this study was to detect the expression of IL-41 in the plasma
      of children with KD and its relationship with the disease.

      A total of 44 children with KD and 37 healthy controls (HC) were recruited
      for this study. Plasma concentrations of IL-41 were determined by ELISA.
      Correlations between plasma IL-41 levels and KD-related clinical
      parameters were analyzed by Pearson correlation and multivariate linear
      regression analysis. Receiver operating characteristic curve analysis was
      used to assess the clinical value of IL-41 in the diagnosis of KD.

      Our results showed that plasma IL-41 levels were significantly elevated in
      children with KD compared with HC. Correlation analysis demonstrated that
      IL-41 levels were positively correlated with D-dimer and N-terminal
      pro-B-type natriuretic peptide, and negatively correlated with IgM, mean
      corpuscular hemoglobin concentration, total protein, albumin and
      pre-albumin. Multivariable linear regression analysis revealed that IgM
      and mean corpuscular hemoglobin concentrations were associated with IL-41.
      Receiver operating characteristic curve analysis showed that the area
      under the curve of IL-41 was 0.7101, with IL-41 providing 88.64 %
      sensitivity and 54.05 % specificity.

      Our study indicated that plasma IL-41 levels in children with KD were
      significantly higher than those in HC, and may provide a potential
      diagnostic biomarker for KD.

      ###Struct:

This is a small language model designed for scientific research application. It is fine tuned to analyzing randomized clinical trial abstracts and to classify sentences into four key sections: Background, Methods, Results, and Conclusion. This makes it easier and faster for researchers to understand and organize important information from clinical studies.

Model Details

The publication rate of Randomized Controlled Trials (RCTs) is consistently increasing, with more than 1 million RCTs already published. Approximately half of these publications are listed in PubMed, posing a significant challenge for medical researchers seeking specific information.

When searching for prior studies, such as for writing systematic reviews, researchers often skim through abstracts to quickly determine if the papers meet their criteria of interest. This task is facilitated when abstracts are structured, meaning the text within an abstract is organized under semantic headings like objective, method, result, and conclusion. However, more than half of the RCT abstracts published are unstructured, complicating the rapid identification of relevant information.

This model classifies each sentence of an abstract into a corresponding heading can greatly accelerate the process of locating the desired information. This classification not only aids researchers but also benefits various downstream applications, including automatic text summarization, information extraction, and information retrieval.

Model Sources [optional]

  • Repository: Coming soon

Uses

Automatic identification of sections in (randomized clinical trial) abstracts.

How to Get Started with the Model

Prompt Format:

'''
###Unstruct:
{abstract}
###Struct:
'''

Usage:

from peft import PeftModel, PeftConfig

#Load the model weights from hub
model_id = "SaborDay/Phi2_RCT1M-ft-heading"
trained_model = PeftModel.from_pretrained(model, model_id)

#Run inference
outputs = trained_model.generate(**inputs, max_length=1000)
text = tokenizer.batch_decode(outputs,skip_special_tokens=True)[0]
print(text)

Example: Application on unseen data

    PROMPT: '###Unstruct:\nKawasaki disease (KD) is a systemic vasculitis that causes abnormalities in the coronary arteries. Interleukin (IL)-41 is a novel immunoregulatory cytokine involved in the pathogenesis of some inflammatory and immune-related diseases. However, the role of IL-41 in KD is unclear. The purpose of this study was to detect the expression of IL-41 in the plasma of children with KD and its relationship with the disease.\nA total of 44 children with KD and 37 healthy controls (HC) were recruited for this study. Plasma concentrations of IL-41 were determined by ELISA. Correlations between plasma IL-41 levels and KD-related clinical parameters were analyzed by Pearson correlation and multivariate linear regression analysis. Receiver operating characteristic curve analysis was used to assess the clinical value of IL-41 in the diagnosis of KD.\nOur results showed that plasma IL-41 levels were significantly elevated in children with KD compared with HC. Correlation analysis demonstrated that IL-41 levels were positively correlated with D-dimer and N-terminal pro-B-type natriuretic peptide, and negatively correlated with IgM, mean corpuscular hemoglobin concentration, total protein, albumin and pre-albumin. Multivariable linear regression analysis revealed that IgM and mean corpuscular hemoglobin concentrations were associated with IL-41. Receiver operating characteristic curve analysis showed that the area under the curve of IL-41 was 0.7101, with IL-41 providing 88.64 % sensitivity and 54.05 % specificity.\nOur study indicated that plasma IL-41 levels in children with KD were significantly higher than those in HC, and may provide a potential diagnostic biomarker for KD.\n###Struct:\n

    Answer Phi2_RCT1M-ft-heading:

    BACKGROUND: Kawasaki disease (KD) is a systemic vasculitis that causes abnormalities in the coronary arteries. Interleukin (IL)-41 is a novel immunoregulatory cytokine involved in the pathogenesis of some inflammatory and immune-related diseases. However, the role of IL-41 in KD is unclear. The purpose of this study was to detect the expression of IL-41 in the plasma of children with KD and its relationship with the disease. METHODS: A total of 44 children with KD and 37 healthy controls (HC) were recruited for this study. Plasma concentrations of IL-41 were determined by ELISA. Correlations between plasma IL-41 levels and KD-related clinical parameters were analyzed by Pearson correlation and multivariate linear regression analysis. Receiver operating characteristic curve analysis was used to assess the clinical value of IL-41 in the diagnosis of KD. RESULTS: Our results showed that plasma IL-41 levels were significantly elevated in children with KD compared with HC. Correlation analysis demonstrated that IL-41 levels were positively correlated with D-dimer and N-terminal pro-B-type natriuretic peptide, and negatively correlated with IgM, mean corpuscular hemoglobin concentration, total protein, albumin and pre-albumin. Multivariable linear regression analysis revealed that IgM and mean corpuscular hemoglobin concentrations were associated with IL-41. Receiver operating characteristic curve analysis showed that the area under the curve of IL-41 was 0.7101, with IL-41 providing 88.64 % sensitivity and 54.05 % specificity. CONCLUSIONS: Our study indicated that plasma IL-41 levels in children with KD were significantly higher than those in HC, and may provide a potential diagnostic biomarker for KD.']

Training Details

Training Data

50k randomly sampled randomized clinical trial abstracts with date of pubblication within [1970-2023]. Abstracts were retrieved from MEDLINE using Biopython.

Training Procedure

Generation of (unstructured, structured) pairs for structured abstracts. Generation of dedicated prompt for Causal_LM modelling.

Training Hyperparameters

bnb_config = BitsAndBytesConfig(load_in_4bit=True,
                            bnb_4bit_quant_type='nf4',
                            bnb_4bit_compute_dtype=torch.bfloat16,
                            bnb_4bit_use_double_quant=True)
                            

Evaluation

Testing Data, Factors & Metrics

Testing Data

10k randomly sampled RCT abstract within period [1970-2023]

Metrics

Coming soon

Technical Specifications [optional]

Model Architecture and Objective

LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=[
    'q_proj','k_proj','v_proj','dense','fc1','fc2'], 
bias="none",
lora_dropout=0.05,
task_type="CAUSAL_LM",
)

Compute Infrastructure

Hardware

1 x RTX4090 - 24 GB

Software

pip install torch einops transformers bitsandbytes accelerate peft 

Model Card Contact

References

https://arxiv.org/abs/1710.06071