File size: 4,758 Bytes
a9bf90f
 
8840ea6
18d4ceb
 
a9bf90f
9f06fd3
a9bf90f
9491370
a9bf90f
6508e05
dc210e8
9f06fd3
a9bf90f
 
 
 
 
dc210e8
2d320ae
dc210e8
 
 
ca212b8
dc210e8
a9bf90f
dc210e8
a9bf90f
dc210e8
 
a9bf90f
 
 
9491370
a9bf90f
dc210e8
a9bf90f
9491370
a9bf90f
 
 
 
 
 
dc210e8
 
6004402
 
 
 
 
 
dc210e8
 
 
 
 
 
 
 
 
 
f38f60d
dc210e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a9bf90f
 
 
9f06fd3
a9bf90f
 
 
9f06fd3
a9bf90f
dc210e8
a9bf90f
dc210e8
a9bf90f
 
 
9f06fd3
a9bf90f
 
 
dc210e8
a9bf90f
dc210e8
a9bf90f
 
 
dc210e8
 
 
 
 
 
 
 
 
 
a9bf90f
33dd42f
a9bf90f
dc210e8
a9bf90f
 
 
dc210e8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
license: apache-2.0
library_name: femr
tags:
- healthcare
---
# CLMBR-T-Base-Random

This is a CLMBR model with randomly initialized weights using a dummy vocabulary. The purpose of this model is to test code pipelines and demonstrate how to use CLMBR before applying for access to the official CLMBR release that was trained on real Stanford Hospital data.

The model architecture is CLMBR-T-Base (144M params), as originally described in [the EHRSHOT paper (Wornow et al. 2023)](https://arxiv.org/abs/2307.02028), and based on the architecture originally developed in [the Clinical Language Modeling Based Representations paper (Steinberg et al. 2021)](https://www.sciencedirect.com/science/article/pii/S1532046420302653)

The weights are random, so **this model has no clinical or research use.**

## Model Details

### Model Description

- **Developed by:** Shah lab @ Stanford University
- **Funded by:** This work was supported in part by the Mark and Debra Leslie Endowment for AI in Healthcare, the Clinical Excellence Research Center at Stanford Medicine, and Technology and Digital Solutions at Stanford Healthcare. MW is supported by an NSF Graduate Research Fellowship. JF was supported in part by a Stanford AIMI-HAI Partnership Grant.
- **Shared by:** Shah lab @ Stanford University
- **Model type:** CLMBR [(Steinberg et al. 2021)](https://www.sciencedirect.com/science/article/pii/S1532046420302653)
- **Language(s) (NLP):** Electronic health record codes
- **License:** Apache-2.0
- **Finetuned from model:** N/A -- trained from scratch

### Model Sources

- **Repository:** [https://github.com/som-shahlab/ehrshot-benchmark/](https://github.com/som-shahlab/ehrshot-benchmark/)
- **Paper:** [EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of  Foundation Models](https://arxiv.org/abs/2307.02028)

## Uses

This model generates (random) dense representations for patients based on the structured data within their electronic health record. 

These representations can then be used for downstream tasks such as predicting diagnoses, detecting anomalies, or doing propensity score matching for causal inference.

Again, please note that **this version of the model has random weights.** Thus, the outputs should be meaningless.


## How to Get Started with the Model

Use the code below to get started with the model.

First, download the necessary libraries.
```bash
# Create Python 3.10 environment
conda create --name ehrshot_env python=3.10 -y
conda activate ehrshot_env

# Install requirements
pip install torch==2.1.1 femr==0.2.0 datasets==2.15.0 flash_attn==2.3.6 transformers==4.35.2
```

Second, run the following Python script to run inference on a single patient:
```python
import femr.models.transformer
import torch
import femr.models.tokenizer
import femr.models.dataloader
import datetime

model_name = "StanfordShahLab/clmbr-t-base-random"

# Load tokenizer / batch loader
tokenizer = femr.models.tokenizer.FEMRTokenizer.from_pretrained(model_name)
batch_processor = femr.models.dataloader.FEMRBatchProcessor(tokenizer)

# Load model
model = femr.models.transformer.FEMRModel.from_pretrained(model_name)

# Create an example patient to run inference on
example_patient = {
    'patient_id': 30,
    'events': [{
        'time': datetime.datetime(2011, 5, 8),
        'measurements': [
            {'code': 'SNOMED/1'},
        ],
    },
    {
        'time': datetime.datetime(2012, 6, 9),
        'measurements': [
            {'code': 'SNOMED/30'},
            {'code': 'SNOMED/103'}
        ],
    }]
}
batch = batch_processor.convert_patient(example_patient, tensor_type="pt")

# Run model
with torch.no_grad():
    patient_ids, times, reprs = model(batch)
    print(patient_ids)
    print(times)
    print(reprs)
```

## Training Details

This model is not trained.

## Evaluation

None, as the weights are random.

## Technical Specifications

Please see [the EHRSHOT paper (Wornow et al. 2023)](https://arxiv.org/abs/2307.02028) for details on the model architecture and objective.

### Compute Infrastructure

This model was not trained.

#### Software

For data loading / processing, this model leverages [FEMR](https://github.com/som-shahlab/femr/tree/main), a Python library for doing machine learning on EHR data at scale.

## Citation

**BibTeX:**

```
@article{wornow2023ehrshot,
  title={EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models}, 
  author={Michael Wornow and Rahul Thapa and Ethan Steinberg and Jason Fries and Nigam Shah},
  year={2023},
  eprint={2307.02028},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}
```

## Model Card Authors

Michael Wornow, Ethan Steinberg, Rahul Thapa, Jason Fries, Nigam H. Shah

## Model Card Contact

Michael Wornow ([email protected])