File size: 4,758 Bytes
a9bf90f 8840ea6 18d4ceb a9bf90f 9f06fd3 a9bf90f 9491370 a9bf90f 6508e05 dc210e8 9f06fd3 a9bf90f dc210e8 2d320ae dc210e8 ca212b8 dc210e8 a9bf90f dc210e8 a9bf90f dc210e8 a9bf90f 9491370 a9bf90f dc210e8 a9bf90f 9491370 a9bf90f dc210e8 6004402 dc210e8 f38f60d dc210e8 a9bf90f 9f06fd3 a9bf90f 9f06fd3 a9bf90f dc210e8 a9bf90f dc210e8 a9bf90f 9f06fd3 a9bf90f dc210e8 a9bf90f dc210e8 a9bf90f dc210e8 a9bf90f 33dd42f a9bf90f dc210e8 a9bf90f dc210e8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
license: apache-2.0
library_name: femr
tags:
- healthcare
---
# CLMBR-T-Base-Random
This is a CLMBR model with randomly initialized weights using a dummy vocabulary. The purpose of this model is to test code pipelines and demonstrate how to use CLMBR before applying for access to the official CLMBR release that was trained on real Stanford Hospital data.
The model architecture is CLMBR-T-Base (144M params), as originally described in [the EHRSHOT paper (Wornow et al. 2023)](https://arxiv.org/abs/2307.02028), and based on the architecture originally developed in [the Clinical Language Modeling Based Representations paper (Steinberg et al. 2021)](https://www.sciencedirect.com/science/article/pii/S1532046420302653)
The weights are random, so **this model has no clinical or research use.**
## Model Details
### Model Description
- **Developed by:** Shah lab @ Stanford University
- **Funded by:** This work was supported in part by the Mark and Debra Leslie Endowment for AI in Healthcare, the Clinical Excellence Research Center at Stanford Medicine, and Technology and Digital Solutions at Stanford Healthcare. MW is supported by an NSF Graduate Research Fellowship. JF was supported in part by a Stanford AIMI-HAI Partnership Grant.
- **Shared by:** Shah lab @ Stanford University
- **Model type:** CLMBR [(Steinberg et al. 2021)](https://www.sciencedirect.com/science/article/pii/S1532046420302653)
- **Language(s) (NLP):** Electronic health record codes
- **License:** Apache-2.0
- **Finetuned from model:** N/A -- trained from scratch
### Model Sources
- **Repository:** [https://github.com/som-shahlab/ehrshot-benchmark/](https://github.com/som-shahlab/ehrshot-benchmark/)
- **Paper:** [EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models](https://arxiv.org/abs/2307.02028)
## Uses
This model generates (random) dense representations for patients based on the structured data within their electronic health record.
These representations can then be used for downstream tasks such as predicting diagnoses, detecting anomalies, or doing propensity score matching for causal inference.
Again, please note that **this version of the model has random weights.** Thus, the outputs should be meaningless.
## How to Get Started with the Model
Use the code below to get started with the model.
First, download the necessary libraries.
```bash
# Create Python 3.10 environment
conda create --name ehrshot_env python=3.10 -y
conda activate ehrshot_env
# Install requirements
pip install torch==2.1.1 femr==0.2.0 datasets==2.15.0 flash_attn==2.3.6 transformers==4.35.2
```
Second, run the following Python script to run inference on a single patient:
```python
import femr.models.transformer
import torch
import femr.models.tokenizer
import femr.models.dataloader
import datetime
model_name = "StanfordShahLab/clmbr-t-base-random"
# Load tokenizer / batch loader
tokenizer = femr.models.tokenizer.FEMRTokenizer.from_pretrained(model_name)
batch_processor = femr.models.dataloader.FEMRBatchProcessor(tokenizer)
# Load model
model = femr.models.transformer.FEMRModel.from_pretrained(model_name)
# Create an example patient to run inference on
example_patient = {
'patient_id': 30,
'events': [{
'time': datetime.datetime(2011, 5, 8),
'measurements': [
{'code': 'SNOMED/1'},
],
},
{
'time': datetime.datetime(2012, 6, 9),
'measurements': [
{'code': 'SNOMED/30'},
{'code': 'SNOMED/103'}
],
}]
}
batch = batch_processor.convert_patient(example_patient, tensor_type="pt")
# Run model
with torch.no_grad():
patient_ids, times, reprs = model(batch)
print(patient_ids)
print(times)
print(reprs)
```
## Training Details
This model is not trained.
## Evaluation
None, as the weights are random.
## Technical Specifications
Please see [the EHRSHOT paper (Wornow et al. 2023)](https://arxiv.org/abs/2307.02028) for details on the model architecture and objective.
### Compute Infrastructure
This model was not trained.
#### Software
For data loading / processing, this model leverages [FEMR](https://github.com/som-shahlab/femr/tree/main), a Python library for doing machine learning on EHR data at scale.
## Citation
**BibTeX:**
```
@article{wornow2023ehrshot,
title={EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models},
author={Michael Wornow and Rahul Thapa and Ethan Steinberg and Jason Fries and Nigam Shah},
year={2023},
eprint={2307.02028},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
## Model Card Authors
Michael Wornow, Ethan Steinberg, Rahul Thapa, Jason Fries, Nigam H. Shah
## Model Card Contact
Michael Wornow ([email protected]) |